[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-12-04 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041799#comment-15041799
 ] 

Joep Rottinghuis commented on HDFS-8578:


Wrt. catching InterruptedException, even though the newer patches do 
distinguish between ExecutionException | CancellationException and others, it 
is still good form to leave the interrupt status intact by re-interupting the 
current thread before returning:
{code}
Thread.currentThread().interrupt();
{code}

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, 
> HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-12-04 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041792#comment-15041792
 ] 

Joep Rottinghuis commented on HDFS-8578:


If I read the patch correctly the parallelism will be the total number of 
storage directories:
{code}
int numParallelThreads = dataDirs.size();
{code}
With 12 disks and 3 namespaces that would mean 36 parallel threads right?

Perhaps it would be better to make this configurable. It can default to 0, 
meaning as parallel as possible, or any other explicit value set (up to # 
storage directories, although the newFixedThreadPool with more threads than 
running would simply not run more in parallel anyway). That way cluster admins 
can choose to either dial this all the way up, hammer the disks and pagecache 
and get it over with as soon as possible, or perhaps tune it down a bit in case 
they choose to keep the NM up and executing tasks in the meantime.
I can imagine that 12 parallel threads (1 per disk in the above example) might 
turn out to be a reasonable compromise for some use cases.


> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, 
> HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-12-04 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041804#comment-15041804
 ] 

Joep Rottinghuis commented on HDFS-8578:


Or in this case perhaps yield and break out of the for loop, because the 
interrupt indicated that the thread should clean up and wrap up asap.

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, 
> HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036421#comment-15036421
 ] 

Joep Rottinghuis commented on HDFS-8791:


Thanks [~ctrezzo], that seems like a reasonable compromise.
Thanks for the additional data points [~andrew.wang], that gives at least some 
comfort that 2.6.x without the patch isn't completely dead for adoption 
(although still at risk).

Aside from 2.6.2->2.6.3 being a surprising layout upgrade with this patch in 
2.6.3, we would also have to make it clear that you would not be able to go 
from 2.6.3 to 2.7.1 because the layout version would go backwards.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-01 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035317#comment-15035317
 ] 

Joep Rottinghuis commented on HDFS-8791:


Totally agree that if the patch goes into 2.6.3 it should go into 2.7.2 as well.

While I appreciate the sentiment that layout format changes would normally 
warrant a new minor release (2.8.0 in this case?), this approach leaves us with 
a dilemma.
We feel that we cannot move from 2.4 to 2.6 despite all if the efforts to 
validate and test without this fix. Luckily we're in the position that we roll 
out own internal build, so technically we're not blocked on this.
We're already happy this fix will go in upstream.
That said, it would block us from rolling cleanly to 2.7.2+ without manually 
applying this patch.

Similarly, what do we tell other users? Don't use 2.6.3 or 2.6.4 because it has 
a fundamental perf problem? Then why even do a 2.6.3 maintenance release? Isn't 
the point of these releases that you can avoid trade-off beteren manually 
applying a list of patches on top of a release?

Similarly, do we tell the HBase community to not use this version of Hadoop and 
just wait for a 2.8.x release and perhaps longer until that has been stabilized 
to the point where folks can run that comfortably in production knowing it has 
been battle tested?

A format release between dot releases is perhaps not ideal either, but if a 
release manager is willing to pull it in, and coordinate with the release of an 
equivalent maintenance release with a newer minor version, then is that not a 
practical workable outcome?

Sent from my iPhone



> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-11-20 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15019290#comment-15019290
 ] 

Joep Rottinghuis commented on HDFS-8791:


Probably not-surprisingly I'm a +1 (non-binding) for the patch and to thanking 
[~ctrezzo] for his work to verify, measure and write up the findings for this.

Not to pile on, but here is another mechanism by which heavy disk IO can impact 
a JVM: http://www.evanjones.ca/jvm-mmap-pause.html (disclaimer: we have not 
observed this exact interaction with the JVM writing hsperfdata and the disk 
layout in the wild, I just want to point out the possible connection).

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> HDFS-8791-trunk-v1.patch
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-10-08 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949667#comment-14949667
 ] 

Joep Rottinghuis commented on HDFS-8791:


Concern of impact of this issue is blocking us from rolling 2.6 to production 
clusters at the moment.
Federation and having 12 disks will likely make this worse. 
256*256 directories * 4 namespaces * 12 disks = 3.1M directories, with only 
some directories with 1 or perhaps 2 blocks in them seems to really not be a 
good idea.

Have a sense that workaround of just not doing du and hope for the best will 
suffice. Find and similar commands will have same impact. 

Similarly I think that we need a command-line tool to take a block (file) name 
and spit out the target directory. Administrators were able to move a block 
from any machine to any other one in any random directory and the DN would pick 
it up. That is no longer the case with the new layout.

[~ctrezzo] is looking further into how this is impacting performance in our 
environment.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Nathan Roberts
>Priority: Critical
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-10-08 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-8791:
---
Affects Version/s: 2.8.0
   2.7.1

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Priority: Critical
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8898) Create API and command-line argument to get quota without need to get file and directory counts

2015-09-11 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741632#comment-14741632
 ] 

Joep Rottinghuis commented on HDFS-8898:


So it sounds like we're discussing two things here:
1) Getting the quota itself for a directory that a user has access to. There 
seems to be little security concerns with this.
2) Getting the quota, and the "ContentSummary" / count / usage for a directory 
that a user has access to, even if they might not have access to all the 
sub-directories. This is where [~jlowe] pointed out that there could be a 
potential security implication.

Even with yielding the NN lock, it seems the NN can still lock for ~1 sec per 
10M files in a sub-directory to check the entire sub-directory sub-directory 
tree for permissions.
To address the potential security implications for 2) we could either make this 
a cluster-wide (final) config value, or we could do something with an extended 
attribute on the directory itself to allow or disallow a particular directory 
to be traversed (or not).

1) would give a huge performance boost for the cases when people just want to 
know what the quota is.
2) would give a huge performance boost for the cases when people want to know a 
quota plus what's left for large directories relatively high in the directory 
structure (let alone / on a huge namespace of many tens of millions of files).

> Create API and command-line argument to get quota without need to get file 
> and directory counts
> ---
>
> Key: HDFS-8898
> URL: https://issues.apache.org/jira/browse/HDFS-8898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Reporter: Joep Rottinghuis
>
> On large directory structures it takes significant time to iterate through 
> the file and directory counts recursively to get a complete ContentSummary.
> When you want to just check for the quota on a higher level directory it 
> would be good to have an option to skip the file and directory counts.
> Moreover, currently one can only check the quota if you have access to all 
> the directories underneath. For example, if I have a large home directory 
> under /user/joep and I host some files for another user in a sub-directory, 
> the moment they create an unreadable sub-directory under my home I can no 
> longer check what my quota is. Understood that I cannot check the current 
> file counts unless I can iterate through all the usage, but for 
> administrative purposes it is nice to be able to get the current quota 
> setting on a directory without the need to iterate through and run into 
> permission issues on sub-directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8898) Create API and command-line argument to get quota without need to get file and directory counts

2015-08-14 Thread Joep Rottinghuis (JIRA)
Joep Rottinghuis created HDFS-8898:
--

 Summary: Create API and command-line argument to get quota without 
need to get file and directory counts
 Key: HDFS-8898
 URL: https://issues.apache.org/jira/browse/HDFS-8898
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs
Reporter: Joep Rottinghuis


On large directory structures it takes significant time to iterate through the 
file and directory counts recursively to get a complete ContentSummary.
When you want to just check for the quota on a higher level directory it would 
be good to have an option to skip the file and directory counts.

Moreover, currently one can only check the quota if you have access to all the 
directories underneath. For example, if I have a large home directory under 
/user/joep and I host some files for another user in a sub-directory, the 
moment they create an unreadable sub-directory under my home I can no longer 
check what my quota is. Understood that I cannot check the current file counts 
unless I can iterate through all the usage, but for administrative purposes it 
is nice to be able to get the current quota setting on a directory without the 
need to iterate through and run into permission issues on sub-directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-08-12 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694504#comment-14694504
 ] 

Joep Rottinghuis commented on HDFS-8791:


Seems related to (or perhaps dup of) HADOOP-10434.

 block ID-based DN storage layout can be very slow for datanode on ext4
 --

 Key: HDFS-8791
 URL: https://issues.apache.org/jira/browse/HDFS-8791
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Nathan Roberts
Priority: Critical

 We are seeing cases where the new directory layout causes the datanode to 
 basically cause the disks to seek for 10s of minutes. This can be when the 
 datanode is running du, and it can also be when it is performing a 
 checkDirs(). Both of these operations currently scan all directories in the 
 block pool and that's very expensive in the new layout.
 The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
 leaf directories where block files are placed.
 So, what we have on disk is:
 - 256 inodes for the first level directories
 - 256 directory blocks for the first level directories
 - 256*256 inodes for the second level directories
 - 256*256 directory blocks for the second level directories
 - Then the inodes and blocks to store the the HDFS blocks themselves.
 The main problem is the 256*256 directory blocks. 
 inodes and dentries will be cached by linux and one can configure how likely 
 the system is to prune those entries (vfs_cache_pressure). However, ext4 
 relies on the buffer cache to cache the directory blocks and I'm not aware of 
 any way to tell linux to favor buffer cache pages (even if it did I'm not 
 sure I would want it to in general).
 Also, ext4 tries hard to spread directories evenly across the entire volume, 
 this basically means the 64K directory blocks are probably randomly spread 
 across the entire disk. A du type scan will look at directories one at a 
 time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
 seeks will be random and far. 
 In a system I was using to diagnose this, I had 60K blocks. A DU when things 
 are hot is less than 1 second. When things are cold, about 20 minutes.
 How do things get cold?
 - A large set of tasks run on the node. This pushes almost all of the buffer 
 cache out, causing the next DU to hit this situation. We are seeing cases 
 where a large job can cause a seek storm across the entire cluster.
 Why didn't the previous layout see this?
 - It might have but it wasn't nearly as pronounced. The previous layout would 
 be a few hundred directory blocks. Even when completely cold, these would 
 only take a few a hundred seeks which would mean single digit seconds.  
 - With only a few hundred directories, the odds of the directory blocks 
 getting modified is quite high, this keeps those blocks hot and much less 
 likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7877) Support maintenance state for datanodes

2015-07-21 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636255#comment-14636255
 ] 

Joep Rottinghuis commented on HDFS-7877:


What do we need to do to get this going (again) in OSS? Just FYI, we're moving 
forward with this at Twitter on production clusters.

 Support maintenance state for datanodes
 ---

 Key: HDFS-7877
 URL: https://issues.apache.org/jira/browse/HDFS-7877
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Ming Ma
 Attachments: HDFS-7877-2.patch, HDFS-7877.patch, 
 Supportmaintenancestatefordatanodes-2.pdf, 
 Supportmaintenancestatefordatanodes.pdf


 This requirement came up during the design for HDFS-7541. Given this feature 
 is mostly independent of upgrade domain feature, it is better to track it 
 under a separate jira. The design and draft patch will be available soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5221) hftp: does not work with HA NN configuration

2013-10-08 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-5221:
---

Affects Version/s: 2.1.1-beta

 hftp: does not work with HA NN configuration
 

 Key: HDFS-5221
 URL: https://issues.apache.org/jira/browse/HDFS-5221
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs-client
Affects Versions: 2.0.5-alpha, 2.1.1-beta
Reporter: Joep Rottinghuis
Priority: Blocker

 When copying data between clusters of significant different version (say from 
 Hadoop 1.x equivalent to Hadoop 2.x) we have to use hftp.
 When HA is configured, you have to point to a single (active) NN.
 Now, when the active NN becomes standby, the the hftp: addresses will fail.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5277) hadoop fs -expunge does not work for federated namespace

2013-10-04 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786626#comment-13786626
 ] 

Joep Rottinghuis commented on HDFS-5277:


Interestingly enough there is a workaround for the expunge by passing the URL 
to the fs command.
The help (boths docs and when typing hdfs dfs does not seem to show that 
additional optional argument.
w/o looking at the code users won't know about this workaround.

 hadoop fs -expunge does not work for federated namespace 
 -

 Key: HDFS-5277
 URL: https://issues.apache.org/jira/browse/HDFS-5277
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Vrushali C

 We noticed that hadoop fs -expunge command does not work across federated 
 namespace. This seems to look at only /user/username/.Trash instead of 
 traversing all available namespace and expunging from individual namespace.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5123) Hftp should support namenode logical service names in URI

2013-09-20 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773367#comment-13773367
 ] 

Joep Rottinghuis commented on HDFS-5123:


Fork-lifting comment 
https://issues.apache.org/jira/browse/HDFS-5221?focusedCommentId=13770133page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13770133

from HDFS-5221 over here to keep the discussion in one place.

We're thinking of two approaches:
1) Fixing the hdfs-client when the hftp is used to use a retry mechanism to 
fail over to the active NN
2) Use URL re-direction from the standby to the active NN.

Advantage of 1) is that it will work even if the (previously) active NN host is 
completely down, or if the NN process is not running at all.

Advantage of 2) is that it will give some benefits / resilience, even if Hadoop 
1.0 clients are not modified.


 Hftp should support namenode logical service names in URI
 -

 Key: HDFS-5123
 URL: https://issues.apache.org/jira/browse/HDFS-5123
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta

 For example if the dfs.nameservices is set to arpit
 {code}
 hdfs dfs -ls hftp://arpit:50070/tmp
 or 
 hdfs dfs -ls hftp://arpit/tmp
 {code}
 does not work
 You have to provide the exact active namenode hostname. On an HA cluster 
 using dfs client one should not need to provide the active nn hostname

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5123) Hftp should support namenode logical service names in URI

2013-09-20 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773371#comment-13773371
 ] 

Joep Rottinghuis commented on HDFS-5123:


[~andrew.wang] saw your comment in HDFS-5221.
Seems approach 1) is great, but not enough to solve ability to copy Hadoop 1.x 
cluster to a Hadoop 2.x cluster. For that we need approach 2).

 Hftp should support namenode logical service names in URI
 -

 Key: HDFS-5123
 URL: https://issues.apache.org/jira/browse/HDFS-5123
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta

 For example if the dfs.nameservices is set to arpit
 {code}
 hdfs dfs -ls hftp://arpit:50070/tmp
 or 
 hdfs dfs -ls hftp://arpit/tmp
 {code}
 does not work
 You have to provide the exact active namenode hostname. On an HA cluster 
 using dfs client one should not need to provide the active nn hostname

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5226) Trash::moveToTrash doesn't work across multiple namespace

2013-09-18 Thread Joep Rottinghuis (JIRA)
Joep Rottinghuis created HDFS-5226:
--

 Summary: Trash::moveToTrash doesn't work across multiple namespace
 Key: HDFS-5226
 URL: https://issues.apache.org/jira/browse/HDFS-5226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: federation
Affects Versions: 2.0.5-alpha
Reporter: Joep Rottinghuis
Priority: Blocker


Trash has introduced new static method moveToAppropriateTrash which resolves to 
right filesystem. To be API compatible we need to check if Trash::moveToTrash 
can do what moveToAppropriateTrash does so that downstream users need not 
change code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5226) Trash::moveToTrash doesn't work across multiple namespace

2013-09-18 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771341#comment-13771341
 ] 

Joep Rottinghuis commented on HDFS-5226:


This manifested itself by Pig being unable to move directories to Trash.

 Trash::moveToTrash doesn't work across multiple namespace
 -

 Key: HDFS-5226
 URL: https://issues.apache.org/jira/browse/HDFS-5226
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: federation
Affects Versions: 2.0.5-alpha
Reporter: Joep Rottinghuis
Priority: Blocker

 Trash has introduced new static method moveToAppropriateTrash which resolves 
 to right filesystem. To be API compatible we need to check if 
 Trash::moveToTrash can do what moveToAppropriateTrash does so that downstream 
 users need not change code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4924) Show NameNode state on dfsclusterhealth page

2013-09-17 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770054#comment-13770054
 ] 

Joep Rottinghuis commented on HDFS-4924:


Marking as blocker. Current patch is definitely needed (one needs to be able to 
see which NN is the active one, particularly since drilling down into browsing 
filesystem will otherwise fail 3 levels down).

Comments from [~cnauroth] are fair. We'll take a look at those and adjust 
accordingly.

 Show NameNode state on dfsclusterhealth page
 

 Key: HDFS-4924
 URL: https://issues.apache.org/jira/browse/HDFS-4924
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: federation
Affects Versions: 2.1.0-beta
Reporter: Lohit Vijayarenu
Assignee: Lohit Vijayarenu
Priority: Blocker
 Attachments: HDFS-4924.trunk.1.patch


 dfsclusterhealth.jsp shows summary of multiple namenodes in cluster. With 
 federation combined with HA it becomes difficult to quickly know the state of 
 NameNodes in the cluster. It would be good to show if NameNode is 
 Active/Standy on summary page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4924) Show NameNode state on dfsclusterhealth page

2013-09-17 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-4924:
---

Priority: Blocker  (was: Minor)

 Show NameNode state on dfsclusterhealth page
 

 Key: HDFS-4924
 URL: https://issues.apache.org/jira/browse/HDFS-4924
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: federation
Affects Versions: 2.1.0-beta
Reporter: Lohit Vijayarenu
Assignee: Lohit Vijayarenu
Priority: Blocker
 Attachments: HDFS-4924.trunk.1.patch


 dfsclusterhealth.jsp shows summary of multiple namenodes in cluster. With 
 federation combined with HA it becomes difficult to quickly know the state of 
 NameNodes in the cluster. It would be good to show if NameNode is 
 Active/Standy on summary page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5221) hftp: does not work with HA NN configuration

2013-09-17 Thread Joep Rottinghuis (JIRA)
Joep Rottinghuis created HDFS-5221:
--

 Summary: hftp: does not work with HA NN configuration
 Key: HDFS-5221
 URL: https://issues.apache.org/jira/browse/HDFS-5221
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs-client
Affects Versions: 2.0.5-alpha
Reporter: Joep Rottinghuis
Priority: Blocker


When copying data between clusters of significant different version (say from 
Hadoop 1.x equivalent to Hadoop 2.x) we have to use hftp.
When HA is configured, you have to point to a single (active) NN.

Now, when the active NN becomes standby, the the hftp: addresses will fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5221) hftp: does not work with HA NN configuration

2013-09-17 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770133#comment-13770133
 ] 

Joep Rottinghuis commented on HDFS-5221:


We're thinking of two approaches:
1) Fixing the hdfs-client when the hftp is used to use a retry mechanism to 
fail over to the active NN
2) Use URL re-direction from the standby to the active NN.

Advantage of 1) is that it will work even if the (previously) active NN host is 
completely down, or if the NN process is not running at all.

Advantage of 2) is that it will give _some_ benefits / resilience, even if 
Hadoop 1.0 clients are not modified.

 hftp: does not work with HA NN configuration
 

 Key: HDFS-5221
 URL: https://issues.apache.org/jira/browse/HDFS-5221
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs-client
Affects Versions: 2.0.5-alpha
Reporter: Joep Rottinghuis
Priority: Blocker

 When copying data between clusters of significant different version (say from 
 Hadoop 1.x equivalent to Hadoop 2.x) we have to use hftp.
 When HA is configured, you have to point to a single (active) NN.
 Now, when the active NN becomes standby, the the hftp: addresses will fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2343) Make hdfs use same version of avro as HBase

2011-09-19 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2343:
---

Attachment: HDFS-2343-branch-0.22.patch

Patch should go in together with patch for HADOOP-7646 and MAPREDUCE-3039.

 Make hdfs use same version of avro as HBase
 ---

 Key: HDFS-2343
 URL: https://issues.apache.org/jira/browse/HDFS-2343
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2343-branch-0.22.patch


 HBase depends on avro 1.5.3 whereas hadoop-common depends on 1.3.2.
 When building HBase on top of hadoop, this should be consistent.
 Moreover, this should be consistent between common, hdfs, and mapreduce.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2341) Contribs not building

2011-09-17 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107268#comment-13107268
 ] 

Joep Rottinghuis commented on HDFS-2341:


I see what happened, in the supplied patch I failed to remove two 
inheritAll=true parameters. Will upload an updated patch.

 Contribs not building
 -

 Key: HDFS-2341
 URL: https://issues.apache.org/jira/browse/HDFS-2341
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2341-branch-0.22.patch


 Contribs are not getting built.
 Snippet from Jenkins:
 compile:
[subant] No sub-builds to iterate on

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2341) Contribs not building

2011-09-17 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2341:
---

Attachment: HDFS-2341-branch-0.22.patch

Konstantin, please roll back previous patch and apply new version.

 Contribs not building
 -

 Key: HDFS-2341
 URL: https://issues.apache.org/jira/browse/HDFS-2341
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2341-branch-0.22.patch, HDFS-2341-branch-0.22.patch


 Contribs are not getting built.
 Snippet from Jenkins:
 compile:
[subant] No sub-builds to iterate on

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2341) Contribs not building

2011-09-16 Thread Joep Rottinghuis (JIRA)
Contribs not building
-

 Key: HDFS-2341
 URL: https://issues.apache.org/jira/browse/HDFS-2341
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0


Contribs are not getting built.
Snippet from Jenkins:

compile:
   [subant] No sub-builds to iterate on

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2341) Contribs not building

2011-09-16 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106663#comment-13106663
 ] 

Joep Rottinghuis commented on HDFS-2341:


See: 
https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Hdfs-22-branch/84/console

 Contribs not building
 -

 Key: HDFS-2341
 URL: https://issues.apache.org/jira/browse/HDFS-2341
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0


 Contribs are not getting built.
 Snippet from Jenkins:
 compile:
[subant] No sub-builds to iterate on

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2341) Contribs not building

2011-09-16 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2341:
---

Status: Patch Available  (was: Open)

 Contribs not building
 -

 Key: HDFS-2341
 URL: https://issues.apache.org/jira/browse/HDFS-2341
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2341-branch-0.22.patch


 Contribs are not getting built.
 Snippet from Jenkins:
 compile:
[subant] No sub-builds to iterate on

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2341) Contribs not building

2011-09-16 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2341:
---

Attachment: HDFS-2341-branch-0.22.patch

 Contribs not building
 -

 Key: HDFS-2341
 URL: https://issues.apache.org/jira/browse/HDFS-2341
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2341-branch-0.22.patch


 Contribs are not getting built.
 Snippet from Jenkins:
 compile:
[subant] No sub-builds to iterate on

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2343) Make hdfs use same version of avro as HBase

2011-09-16 Thread Joep Rottinghuis (JIRA)
Make hdfs use same version of avro as HBase
---

 Key: HDFS-2343
 URL: https://issues.apache.org/jira/browse/HDFS-2343
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0


HBase depends on avro 1.5.3 whereas hadoop-common depends on 1.3.2.
When building HBase on top of hadoop, this should be consistent.
Moreover, this should be consistent between common, hdfs, and mapreduce.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2341) Contribs not building

2011-09-16 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107034#comment-13107034
 ] 

Joep Rottinghuis commented on HDFS-2341:


Mapred one was clean.
Seems that the hdfs one is failing on clean target. Will look into that and 
provide fix.

 Contribs not building
 -

 Key: HDFS-2341
 URL: https://issues.apache.org/jira/browse/HDFS-2341
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2341-branch-0.22.patch


 Contribs are not getting built.
 Snippet from Jenkins:
 compile:
[subant] No sub-builds to iterate on

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.

2011-09-06 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098393#comment-13098393
 ] 

Joep Rottinghuis commented on HDFS-2189:


The Hadoop-Hdfs-22-branch build on Jenkins fails with the following:
BUILD FAILED
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-22-branch/trunk/build.xml:1288:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-22-branch/trunk/src/contrib/build.xml:60:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-22-branch/trunk/src/contrib/fuse-dfs/build.xml:22:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-22-branch/trunk/src/contrib/build-contrib.xml:68:
 Source resource does not exist: 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-22-branch/trunk/src/contrib/ivy/libraries.properties

When I revert the version of Ant on my desktop from 1.8.1 to 1.6.5 I see the 
same issue.
However, it seems that HDFS does not build with 1.6.5 anyway:
BUILD FAILED
/home/jrottinghuis/git/hadoop-common/hdfs/build.xml:1545: Class 
org.apache.tools.ant.taskdefs.ConditionTask doesn't support the nested 
typefound element.
This is supposed to be fixed with ant 1.7.0.

I suspect with the Jenkins slaves failing, there is now a different version of 
ant on the current slave.
What version of ant is supposed to be used on the apache Jenkins servers? 
That way I can install the same version and make sure the builds go through.

 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
 -

 Key: HDFS-2189
 URL: https://issues.apache.org/jira/browse/HDFS-2189
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2189-1.patch, patch.txt


 Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2315) Build fails with ant 1.7.0 but works with 1.8.0

2011-09-06 Thread Joep Rottinghuis (JIRA)
Build fails with ant 1.7.0 but works with 1.8.0
---

 Key: HDFS-2315
 URL: https://issues.apache.org/jira/browse/HDFS-2315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: Ubuntu 11.04; Sun JDK 1.6.0_26; Ant 1.8.2; Ant 1.7.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0


Build failure:
https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Hdfs-22-branch/80
build.xml calls build.xml in contrib, which calls fuse build, which in turn 
uses build-contrib.
The inheritAll=true overrides the basedir in ant 1.7.0 but not in 1.8.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.

2011-09-06 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis resolved HDFS-2189.


Resolution: Fixed

I opened a separate bug for the build issue, which is related to the version of 
Ant.
See: HDFS-2315

 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
 -

 Key: HDFS-2189
 URL: https://issues.apache.org/jira/browse/HDFS-2189
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2189-1.patch, patch.txt


 Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2315) Build fails with ant 1.7.0 but works with 1.8.0

2011-09-06 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098438#comment-13098438
 ] 

Joep Rottinghuis commented on HDFS-2315:


Jenkins log:
==
==
BUILD: ant clean tar mvn-deploy findbugs -Dtest.junit.output.format=xml 
-Dcompile.c++=true -Dcompile.native=true -Dfindbugs.home=$FINDBUGS_HOME 
-Dforrest.home=$FORREST_HOME -Dclover.home=$CLOVER_HOME 
-Declipse.home=$ECLIPSE_HOME
==
==


Buildfile: build.xml

clean-contrib:

clean:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-22-branch/trunk/build.xml:1288:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-22-branch/trunk/src/contrib/build.xml:60:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-22-branch/trunk/src/contrib/fuse-dfs/build.xml:22:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-22-branch/trunk/src/contrib/build-contrib.xml:68:
 Source resource does not exist: 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-22-branch/trunk/src/contrib/ivy/libraries.properties

Total time: 0 seconds

 Build fails with ant 1.7.0 but works with 1.8.0
 ---

 Key: HDFS-2315
 URL: https://issues.apache.org/jira/browse/HDFS-2315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: Ubuntu 11.04; Sun JDK 1.6.0_26; Ant 1.8.2; Ant 1.7.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0


 Build failure:
 https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Hdfs-22-branch/80
 build.xml calls build.xml in contrib, which calls fuse build, which in turn 
 uses build-contrib.
 The inheritAll=true overrides the basedir in ant 1.7.0 but not in 1.8.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2315) Build fails with ant 1.7.0 but works with 1.8.0

2011-09-06 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2315:
---

Status: Patch Available  (was: Open)

 Build fails with ant 1.7.0 but works with 1.8.0
 ---

 Key: HDFS-2315
 URL: https://issues.apache.org/jira/browse/HDFS-2315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: Ubuntu 11.04; Sun JDK 1.6.0_26; Ant 1.8.2; Ant 1.7.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2315.patch


 Build failure:
 https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Hdfs-22-branch/80
 build.xml calls build.xml in contrib, which calls fuse build, which in turn 
 uses build-contrib.
 The inheritAll=true overrides the basedir in ant 1.7.0 but not in 1.8.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2315) Build fails with ant 1.7.0 but works with 1.8.0

2011-09-06 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2315:
---

Attachment: HDFS-2315.patch

Passing basedir explicitly to the contrib calls should override its local 
setting.
This is a bug in ant 1.6.5 and 1.7.0, but not in ant 1.8.0. Fix works for all 
versions.

 Build fails with ant 1.7.0 but works with 1.8.0
 ---

 Key: HDFS-2315
 URL: https://issues.apache.org/jira/browse/HDFS-2315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: Ubuntu 11.04; Sun JDK 1.6.0_26; Ant 1.8.2; Ant 1.7.0
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2315.patch


 Build failure:
 https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Hdfs-22-branch/80
 build.xml calls build.xml in contrib, which calls fuse build, which in turn 
 uses build-contrib.
 The inheritAll=true overrides the basedir in ant 1.7.0 but not in 1.8.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.

2011-09-04 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis reopened HDFS-2189:


  Assignee: Joep Rottinghuis  (was: Plamen Jeliazkov)

Code has been fixed but mvn:publish target must be run to publish jar+pom. 
Downstream builds are failing.


 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
 -

 Key: HDFS-2189
 URL: https://issues.apache.org/jira/browse/HDFS-2189
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2189-1.patch, patch.txt


 Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.

2011-09-02 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096242#comment-13096242
 ] 

Joep Rottinghuis commented on HDFS-2189:


Integration build should be kicked because the last published POM still have 
the erroneous reference to org.apache.hadooip#guava. This is failing downstream 
builds. See HBASE-4327

 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
 -

 Key: HDFS-2189
 URL: https://issues.apache.org/jira/browse/HDFS-2189
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2189-1.patch, patch.txt


 Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2297) FindBugs OutOfMemoryError

2011-08-29 Thread Joep Rottinghuis (JIRA)
FindBugs OutOfMemoryError
-

 Key: HDFS-2297
 URL: https://issues.apache.org/jira/browse/HDFS-2297
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: FindBugs 1.3.9, ant 1.8.2, RHEL6, Jenkins 1.414 in Tomcat 
7.0.14, Sun Java HotSpot(TM) 64-Bit Server VM
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker


When running the findbugs target from Jenkins, I get an OutOfMemory error.
The effort in FindBugs is set to Max which ends up using a lot of memory to 
go through all the classes. The jvmargs passed to FindBugs is hardcoded to 512 
MB max.

We can leave the default to 512M, as long as we pass this as an ant parameter 
which can be overwritten in individual cases through -D, or in the 
build.properties file (either basedir, or user's home directory).


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2297) FindBugs OutOfMemoryError

2011-08-29 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093004#comment-13093004
 ] 

Joep Rottinghuis commented on HDFS-2297:


findbugs:
[mkdir] Created dir: 
/hadoop01/jenkins/jobs/hadoop-hdfs-test/workspace/hdfs/build/test/findbugs
 [findbugs] Executing findbugs from ant task
 [findbugs] Running FindBugs...
 [findbugs] Out of memory
 [findbugs] Total memory: 477M
 [findbugs]  free memory: 65M
 [findbugs] Analyzed: 
/hadoop01/jenkins/jobs/hadoop-hdfs-test/workspace/hdfs/build/hadoop-hdfs-0.22-joep-0.1.jar
 [findbugs]  Aux: 
/hadoop01/jenkins/jobs/hadoop-hdfs-test/workspace/hdfs/build/ivy/lib/Hadoop-Hdfs/common/ant-1.6.5.jar
 ...lines cut
 [findbugs]  Aux: 
/hadoop01/jenkins/jobs/hadoop-hdfs-test/workspace/hdfs/build/ivy/lib/Hadoop-Hdfs/common/paranamer-ant-2.2.jar
 [findbugs]  Aux: 
/hadoop01/jenkins/jobs/hadoop-hdfs-test/workspace/hdfs/build/ivy/lib/Hadoop-Hdfs/common/paranamer-generator-2.2.jar
 [findbugs]  Aux: 
/hadoop01/jenkins/jobs/hadoop-hdfs-test/workspace/hdfs/build/ivy/lib/Hadoop-Hdfs/common/qdox-1.10.1.jar
 [findbugs]  Aux: 
/hadoop01/jenkins/jobs/hadoop-hdfs-test/workspace/hdfs/build/ivy/lib/Hadoop-Hdfs/common/servlet-api-2.5-6.1.14.jar
 [findbugs]  Aux: 
/hadoop01/jenkins/jobs/hadoop-hdfs-test/workspace/hdfs/build/ivy/lib/Hadoop-Hdfs/common/slf4j-api-1.5.11.jar
 [findbugs]  Aux: 
/hadoop01/jenkins/jobs/hadoop-hdfs-test/workspace/hdfs/build/ivy/lib/Hadoop-Hdfs/common/xmlenc-0.52.jar
 [findbugs] Exception in thread main java.lang.OutOfMemoryError: GC overhead 
limit exceeded
 [findbugs] at 
edu.umd.cs.findbugs.ba.type.TypeAnalysis.createFact(TypeAnalysis.java:291)
 [findbugs] at 
edu.umd.cs.findbugs.ba.type.TypeAnalysis.getCachedExceptionSet(TypeAnalysis.java:688)
 [findbugs] at 
edu.umd.cs.findbugs.ba.type.TypeAnalysis.computeThrownExceptionTypes(TypeAnalysis.java:438)
 [findbugs] at 
edu.umd.cs.findbugs.ba.type.TypeAnalysis.transfer(TypeAnalysis.java:410)
 [findbugs] at 
edu.umd.cs.findbugs.ba.type.TypeAnalysis.transfer(TypeAnalysis.java:88)
 [findbugs] at edu.umd.cs.findbugs.ba.Dataflow.execute(Dataflow.java:356)
 [findbugs] at 
edu.umd.cs.findbugs.classfile.engine.bcel.TypeDataflowFactory.analyze(TypeDataflowFactory.java:82)
 ... 27 more
 [findbugs] Java Result: 1

 FindBugs OutOfMemoryError
 -

 Key: HDFS-2297
 URL: https://issues.apache.org/jira/browse/HDFS-2297
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: FindBugs 1.3.9, ant 1.8.2, RHEL6, Jenkins 1.414 in 
 Tomcat 7.0.14, Sun Java HotSpot(TM) 64-Bit Server VM
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker

 When running the findbugs target from Jenkins, I get an OutOfMemory error.
 The effort in FindBugs is set to Max which ends up using a lot of memory to 
 go through all the classes. The jvmargs passed to FindBugs is hardcoded to 
 512 MB max.
 We can leave the default to 512M, as long as we pass this as an ant parameter 
 which can be overwritten in individual cases through -D, or in the 
 build.properties file (either basedir, or user's home directory).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2297) FindBugs OutOfMemoryError

2011-08-29 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2297:
---

Attachment: HDFS-2297.patch

 FindBugs OutOfMemoryError
 -

 Key: HDFS-2297
 URL: https://issues.apache.org/jira/browse/HDFS-2297
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: FindBugs 1.3.9, ant 1.8.2, RHEL6, Jenkins 1.414 in 
 Tomcat 7.0.14, Sun Java HotSpot(TM) 64-Bit Server VM
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Attachments: HDFS-2297.patch


 When running the findbugs target from Jenkins, I get an OutOfMemory error.
 The effort in FindBugs is set to Max which ends up using a lot of memory to 
 go through all the classes. The jvmargs passed to FindBugs is hardcoded to 
 512 MB max.
 We can leave the default to 512M, as long as we pass this as an ant parameter 
 which can be overwritten in individual cases through -D, or in the 
 build.properties file (either basedir, or user's home directory).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-2211) Build does not pass along properties to contrib builds

2011-08-22 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis reassigned HDFS-2211:
--

Assignee: Joep Rottinghuis

 Build does not pass along properties to contrib builds
 --

 Key: HDFS-2211
 URL: https://issues.apache.org/jira/browse/HDFS-2211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04
 Sun JRE 1.6
 Ant 1.8.2
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2211.patch


 Subant call to compile contribs do not pass along parameters from parent 
 build.
 Properties such as hadoop-common.version, asfrepo, offline, etc. are not 
 passed along.
 Result is that build not connected to Internet fails, hdfs proxy refuses to 
 build against own recently built common but rather downloads 0.22-SNAPSHOT 
 from apache again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-2214) Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT

2011-08-22 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis reassigned HDFS-2214:
--

Assignee: Joep Rottinghuis

 Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT
 ---

 Key: HDFS-2214
 URL: https://issues.apache.org/jira/browse/HDFS-2214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04; Sun JDK 1.6_016  Sun JDK 
 1.6.0_26; Ant 1.8.2
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
 Fix For: 0.22.0

 Attachments: HDFS-2214-follow-up.patch, HDFS-2214.patch, 
 HDFS-2214.patch, HDFS-2214.patch, HDFS-2214.patch


 The generated poms inject the version of hdfs itsel, but hardcode the version 
 of hadoop-common they depend on.
 When trying to build downstream projects for example mapreduce, then they 
 will require hadoop-common-0.22.0-SNAPSHOT.jar.
 When trying to do an offline build this will fail to resolve as another 
 hadoop-common has been installed in the local maven repo.
 Even during online build, it should compile against the hadoop-common that 
 hdfs compiled against.
 When versions mismatch one cannot do a coherent build. That is particularly 
 problematic when making simultaneous change in hadoop-common and hadoop-hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2214) Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT

2011-08-18 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086990#comment-13086990
 ] 

Joep Rottinghuis commented on HDFS-2214:


Ok, I will create a new patch this morning that will apply cleanly after 
rolling back the initial one.

 Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT
 ---

 Key: HDFS-2214
 URL: https://issues.apache.org/jira/browse/HDFS-2214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04; Sun JDK 1.6_016  Sun JDK 
 1.6.0_26; Ant 1.8.2
Reporter: Joep Rottinghuis
 Fix For: 0.22.0

 Attachments: HDFS-2214-follow-up.patch, HDFS-2214.patch, 
 HDFS-2214.patch, HDFS-2214.patch


 The generated poms inject the version of hdfs itsel, but hardcode the version 
 of hadoop-common they depend on.
 When trying to build downstream projects for example mapreduce, then they 
 will require hadoop-common-0.22.0-SNAPSHOT.jar.
 When trying to do an offline build this will fail to resolve as another 
 hadoop-common has been installed in the local maven repo.
 Even during online build, it should compile against the hadoop-common that 
 hdfs compiled against.
 When versions mismatch one cannot do a coherent build. That is particularly 
 problematic when making simultaneous change in hadoop-common and hadoop-hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2214) Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT

2011-08-17 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086581#comment-13086581
 ] 

Joep Rottinghuis commented on HDFS-2214:


Somehow in this patch it seems we rolled back HDFS-2189. It looks like we 
drooped the guava dependency again. I'll have to check today and verify.
If indeed we did, then how to we rectify that situation?
Do we want to roll back this patch and I attach the proper one that does not 
roll back the guava?

 Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT
 ---

 Key: HDFS-2214
 URL: https://issues.apache.org/jira/browse/HDFS-2214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04; Sun JDK 1.6_016  Sun JDK 
 1.6.0_26; Ant 1.8.2
Reporter: Joep Rottinghuis
 Fix For: 0.22.0

 Attachments: HDFS-2214.patch, HDFS-2214.patch, HDFS-2214.patch


 The generated poms inject the version of hdfs itsel, but hardcode the version 
 of hadoop-common they depend on.
 When trying to build downstream projects for example mapreduce, then they 
 will require hadoop-common-0.22.0-SNAPSHOT.jar.
 When trying to do an offline build this will fail to resolve as another 
 hadoop-common has been installed in the local maven repo.
 Even during online build, it should compile against the hadoop-common that 
 hdfs compiled against.
 When versions mismatch one cannot do a coherent build. That is particularly 
 problematic when making simultaneous change in hadoop-common and hadoop-hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HDFS-2214) Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT

2011-08-17 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis reopened HDFS-2214:



Need to add back in the guava dependency that was inadvertently dropped. Will 
attach follow-up patch for this.

 Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT
 ---

 Key: HDFS-2214
 URL: https://issues.apache.org/jira/browse/HDFS-2214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04; Sun JDK 1.6_016  Sun JDK 
 1.6.0_26; Ant 1.8.2
Reporter: Joep Rottinghuis
 Fix For: 0.22.0

 Attachments: HDFS-2214.patch, HDFS-2214.patch, HDFS-2214.patch


 The generated poms inject the version of hdfs itsel, but hardcode the version 
 of hadoop-common they depend on.
 When trying to build downstream projects for example mapreduce, then they 
 will require hadoop-common-0.22.0-SNAPSHOT.jar.
 When trying to do an offline build this will fail to resolve as another 
 hadoop-common has been installed in the local maven repo.
 Even during online build, it should compile against the hadoop-common that 
 hdfs compiled against.
 When versions mismatch one cannot do a coherent build. That is particularly 
 problematic when making simultaneous change in hadoop-common and hadoop-hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2214) Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT

2011-08-17 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2214:
---

Status: Patch Available  (was: Reopened)

HDFS-2214.patch rolls back the HDFS-2189 fix which it should not have. 
HDFS-2214-follow-up.patch rectifies this.

 Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT
 ---

 Key: HDFS-2214
 URL: https://issues.apache.org/jira/browse/HDFS-2214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04; Sun JDK 1.6_016  Sun JDK 
 1.6.0_26; Ant 1.8.2
Reporter: Joep Rottinghuis
 Fix For: 0.22.0

 Attachments: HDFS-2214-follow-up.patch, HDFS-2214.patch, 
 HDFS-2214.patch, HDFS-2214.patch


 The generated poms inject the version of hdfs itsel, but hardcode the version 
 of hadoop-common they depend on.
 When trying to build downstream projects for example mapreduce, then they 
 will require hadoop-common-0.22.0-SNAPSHOT.jar.
 When trying to do an offline build this will fail to resolve as another 
 hadoop-common has been installed in the local maven repo.
 Even during online build, it should compile against the hadoop-common that 
 hdfs compiled against.
 When versions mismatch one cannot do a coherent build. That is particularly 
 problematic when making simultaneous change in hadoop-common and hadoop-hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2211) Build does not pass along properties to contrib builds

2011-08-06 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080497#comment-13080497
 ] 

Joep Rottinghuis commented on HDFS-2211:


Hi,
Thanks for your mail. I'll be out of the office until Wednesday August 17th.
During this time I will not be checking my e-mail.
Thanks,
Joep


 Build does not pass along properties to contrib builds
 --

 Key: HDFS-2211
 URL: https://issues.apache.org/jira/browse/HDFS-2211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04
 Sun JRE 1.6
 Ant 1.8.2
Reporter: Joep Rottinghuis
Priority: Blocker
 Attachments: HDFS-2211.patch


 Subant call to compile contribs do not pass along parameters from parent 
 build.
 Properties such as hadoop-common.version, asfrepo, offline, etc. are not 
 passed along.
 Result is that build not connected to Internet fails, hdfs proxy refuses to 
 build against own recently built common but rather downloads 0.22-SNAPSHOT 
 from apache again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.

2011-07-29 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073029#comment-13073029
 ] 

Joep Rottinghuis commented on HDFS-2189:


I was asked why this patch has to be rolled back.
This patch adds a dependency on org.apache.hadoop#guava version r09.
The modified file is used to generate a POM.
In downstream builds this POM is used to determine what other dependencies need 
to be pulled in when the hadoop-hdfs jar is pulled in.
Problem is that org.apache.hadoop#guava does not exist, it anything it should 
be com.google.guava#guava.
In other words the group ID is wrong.

Moreover, it is not clear this is the correct fix for the original mapreduce 
compilation error in the first place. It is likely that the proper fix is to 
put the dependency in the template for mapreduce (resulting in the guava jar to 
be downloaded in the lib directory for the mapreduce build). Preliminary tests 
show this to be the case, but I need to do further testing.

One way or the other, this patch does not work and causes downstream problems.

 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
 -

 Key: HDFS-2189
 URL: https://issues.apache.org/jira/browse/HDFS-2189
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Blocker
 Fix For: 0.22.0

 Attachments: patch.txt


 Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.

2011-07-29 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2189:
---

Status: Patch Available  (was: Reopened)

 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
 -

 Key: HDFS-2189
 URL: https://issues.apache.org/jira/browse/HDFS-2189
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2189-1.patch, patch.txt


 Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.

2011-07-29 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2189:
---

Attachment: HDFS-2189-1.patch

Attaching new patch (2189-1).
Indeed hdfs depends on guava.
Specificaly o.a.h.hdfs.SocketCache imports the com.google. classes.
Note that I generated this patch after applying the HDFS-2214 patch first.


 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
 -

 Key: HDFS-2189
 URL: https://issues.apache.org/jira/browse/HDFS-2189
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-2189-1.patch, patch.txt


 Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2211) Build does not pass along properties to contrib builds

2011-07-28 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072443#comment-13072443
 ] 

Joep Rottinghuis commented on HDFS-2211:


Should have used code markup as pound comment sign was translated to 1.'s.

 Build does not pass along properties to contrib builds
 --

 Key: HDFS-2211
 URL: https://issues.apache.org/jira/browse/HDFS-2211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04
 Sun JRE 1.6
 Ant 1.8.2
Reporter: Joep Rottinghuis
Priority: Minor
 Attachments: HDFS-2211.patch


 Subant call to compile contribs do not pass along parameters from parent 
 build.
 Properties such as hadoop-common.version, asfrepo, offline, etc. are not 
 passed along.
 Result is that build not connected to Internet fails, hdfs proxy refuses to 
 build against own recently built common but rather downloads 0.22-SNAPSHOT 
 from apache again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2211) Build does not pass along properties to contrib builds

2011-07-28 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072441#comment-13072441
 ] 

Joep Rottinghuis commented on HDFS-2211:


To build hdfs without Internet connectivity:
1) First do a build with connectivity or copy ~/.ivy2 from a machine where such 
build did succeed.
2) Set up a hand-full of files in a local repo to avoid download from Maven 
repo:
and have three files available:
/home/user/buildrepo/dist/commons/daemon/binaries/1.0.2/linux/commons-daemon-1.0.2-bin-linux-i386.tar.gz
/home/user/buildrepo/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
/home/user/buildrepo/org/apache/maven/maven-ant-tasks/2.0.10/maven-ant-tasks-2.0.10.jar
3) Before the build run these commands to copy the jars in place for ant to 
resolve (otherwise you have a bootstrap problem):
cp ${HOME}/buildrepo/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar 
${WORKSPACE}/hdfs/ivy/
cp 
${HOME}/buildrepo/org/apache/maven/maven-ant-tasks/2.0.10/maven-ant-tasks-2.0.10.jar
 ${WORKSPACE}/hdfs/ivy/

3) Do a build of common (using similar tricks to this if needed) on common. 
Exectute the mvn-install target to publish your own version to the local Maven 
repo (~/.m2/repository).
4) Set these properties in your build.properties (version can be passed in from 
Jenkins as a parameter instead when this patch is applied. Build.properties can 
sit in user's home directory or in hdfs directory.

# Ivy dependency resolution instruction:
resolvers=internal

#you can increment this number as you see fit and/or pass the ${BUILD_NUMBER} 
from Jenkins
build.number=0
version=0.22-coolname-${build.number}
project.version=${version}
hadoop.version=${version}

# Note that hadoop-core from 0.20* branches is renamed to hadoop-common
hadoop-common.version=${version}
hadoop-hdfs.version=${version}

# Specify to not download ivy. When used, must provide ivy jar ourselves.
offline=true

# Instead of reaching out to Internet, pull ivy jar from local 
mvnrepo=file:${user.home}/buildrepo
# User by hadoop-common/hdfs and hadoop-common/mapreduce
mvn.repo=file:${user.home}/buildrepo

 Build does not pass along properties to contrib builds
 --

 Key: HDFS-2211
 URL: https://issues.apache.org/jira/browse/HDFS-2211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04
 Sun JRE 1.6
 Ant 1.8.2
Reporter: Joep Rottinghuis
Priority: Minor
 Attachments: HDFS-2211.patch


 Subant call to compile contribs do not pass along parameters from parent 
 build.
 Properties such as hadoop-common.version, asfrepo, offline, etc. are not 
 passed along.
 Result is that build not connected to Internet fails, hdfs proxy refuses to 
 build against own recently built common but rather downloads 0.22-SNAPSHOT 
 from apache again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2214) Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT

2011-07-28 Thread Joep Rottinghuis (JIRA)
Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT
---

 Key: HDFS-2214
 URL: https://issues.apache.org/jira/browse/HDFS-2214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04; Sun JDK 1.6_016  Sun JDK 
1.6.0_26; Ant 1.8.2
Reporter: Joep Rottinghuis


The generated poms inject the version of hdfs itsel, but hardcode the version 
of hadoop-common they depend on.
When trying to build downstream projects for example mapreduce, then they will 
require hadoop-common-0.22.0-SNAPSHOT.jar.

When trying to do an offline build this will fail to resolve as another 
hadoop-common has been installed in the local maven repo.
Even during online build, it should compile against the hadoop-common that hdfs 
compiled against.

When versions mismatch one cannot do a coherent build. That is particularly 
problematic when making simultaneous change in hadoop-common and hadoop-hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2214) Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT

2011-07-28 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2214:
---

Attachment: HDFS-2214.patch

 Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT
 ---

 Key: HDFS-2214
 URL: https://issues.apache.org/jira/browse/HDFS-2214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04; Sun JDK 1.6_016  Sun JDK 
 1.6.0_26; Ant 1.8.2
Reporter: Joep Rottinghuis
 Attachments: HDFS-2214.patch


 The generated poms inject the version of hdfs itsel, but hardcode the version 
 of hadoop-common they depend on.
 When trying to build downstream projects for example mapreduce, then they 
 will require hadoop-common-0.22.0-SNAPSHOT.jar.
 When trying to do an offline build this will fail to resolve as another 
 hadoop-common has been installed in the local maven repo.
 Even during online build, it should compile against the hadoop-common that 
 hdfs compiled against.
 When versions mismatch one cannot do a coherent build. That is particularly 
 problematic when making simultaneous change in hadoop-common and hadoop-hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2214) Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT

2011-07-28 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2214:
---

Status: Patch Available  (was: Open)

 Generated POMs hardcode dependency on hadoop-common version 0.22.0-SNAPSHOT
 ---

 Key: HDFS-2214
 URL: https://issues.apache.org/jira/browse/HDFS-2214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04; Sun JDK 1.6_016  Sun JDK 
 1.6.0_26; Ant 1.8.2
Reporter: Joep Rottinghuis
 Attachments: HDFS-2214.patch


 The generated poms inject the version of hdfs itsel, but hardcode the version 
 of hadoop-common they depend on.
 When trying to build downstream projects for example mapreduce, then they 
 will require hadoop-common-0.22.0-SNAPSHOT.jar.
 When trying to do an offline build this will fail to resolve as another 
 hadoop-common has been installed in the local maven repo.
 Even during online build, it should compile against the hadoop-common that 
 hdfs compiled against.
 When versions mismatch one cannot do a coherent build. That is particularly 
 problematic when making simultaneous change in hadoop-common and hadoop-hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.

2011-07-28 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072644#comment-13072644
 ] 

Joep Rottinghuis commented on HDFS-2189:


I need to do a little more work to determine if the depency should really be in 
the template of mapreduce, or in hdfs.
In either case this patch needs to be reverted because it has the wrong group 
ID:org.apache.hadoop

Downstream buids such as mapreduce will fail with an error that 
org.apache.hadoop#guava cannote be resolved.
The correct group ID should be com.google.guava.

 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
 -

 Key: HDFS-2189
 URL: https://issues.apache.org/jira/browse/HDFS-2189
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Blocker
 Fix For: 0.22.0

 Attachments: patch.txt


 Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.

2011-07-28 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072645#comment-13072645
 ] 

Joep Rottinghuis commented on HDFS-2189:


After Apache pushes a new 0.22.0-SNAPSHOT for HDFS jars to the Maven repo, you 
may be to clear out the local maven repo (~/.m2) and ivy cache (~/.ivy2) to get 
downstream builds working properly again.

 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
 -

 Key: HDFS-2189
 URL: https://issues.apache.org/jira/browse/HDFS-2189
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Blocker
 Fix For: 0.22.0

 Attachments: patch.txt


 Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1935) Build should not redownload ivy on every invocation

2011-07-27 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071988#comment-13071988
 ] 

Joep Rottinghuis commented on HDFS-1935:


I solved this in a different way. Do a one-time build to prime the ~/.ivy2 
directory and/or copy this from a machine with Internet access.

I set the mvn.repo property in ~/build.properties (or pass it in as a -D 
option). Note that in common this property is called mvnrepo (no dot).
mvn.repo=file:/home/jrottinghuis/buildrepo.
Then I have two file:
/home/jrottinghuis/buildrepo/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
/home/jrottinghuis/buildrepo/org/apache/maven/maven-ant-tasks/2.0.10/maven-ant-tasks-2.0.10.jar

The former is needed for all targets, the latter only if you want to use the 
mvn-install or mvn-publish targets.

One other bootstrap problem with this is that the ivy and tasks cannot be 
found. I therefore manually copy both jars into hadoop-common/hdfs/ivy (also 
hadoop-common/common/ivy and hadoop-common/mapreduce). In Jenkins I have a 
simple build step for this.

There is still a problem though, and that is that in the compile-contrib target 
a subant call is made. That does not pass along properties. That is problematic 
even when one sets other properties (for example hadoop-common.version. I'll 
file a separate bug for this.

 Build should not redownload ivy on every invocation
 ---

 Key: HDFS-1935
 URL: https://issues.apache.org/jira/browse/HDFS-1935
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Trivial
  Labels: newbie
 Fix For: 0.22.0

 Attachments: diff, hdfs-1935.patch, hdfs-1935.txt


 Currently we re-download ivy every time we build. If the jar already exists, 
 we should skip this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1935) Build should not redownload ivy on every invocation

2011-07-27 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071992#comment-13071992
 ] 

Joep Rottinghuis commented on HDFS-1935:


Forgot to mention that my I do pass 
resolvers=internal
offline=true

in the build.property as well.

 Build should not redownload ivy on every invocation
 ---

 Key: HDFS-1935
 URL: https://issues.apache.org/jira/browse/HDFS-1935
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Trivial
  Labels: newbie
 Fix For: 0.22.0

 Attachments: diff, hdfs-1935.patch, hdfs-1935.txt


 Currently we re-download ivy every time we build. If the jar already exists, 
 we should skip this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2211) Build does not pass along properties to contrib builds

2011-07-27 Thread Joep Rottinghuis (JIRA)
Build does not pass along properties to contrib builds
--

 Key: HDFS-2211
 URL: https://issues.apache.org/jira/browse/HDFS-2211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04
Sun JRE 1.6
Ant 1.8.2
Reporter: Joep Rottinghuis
Priority: Minor


Subant call to compile contribs do not pass along parameters from parent build.
Properties such as hadoop-common.version, asfrepo, offline, etc. are not passed 
along.
Result is that build not connected to Internet fails, hdfs proxy refuses to 
build against own recently built common but rather downloads 0.22-SNAPSHOT from 
apache again.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2211) Build does not pass along properties to contrib builds

2011-07-27 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072000#comment-13072000
 ] 

Joep Rottinghuis commented on HDFS-2211:


The subant calls in src/contrib/build.xml and src/contrib/build-contrib.xml 
suffer from the same problem. There is one antcall in build-contrib with the 
same.

 Build does not pass along properties to contrib builds
 --

 Key: HDFS-2211
 URL: https://issues.apache.org/jira/browse/HDFS-2211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04
 Sun JRE 1.6
 Ant 1.8.2
Reporter: Joep Rottinghuis
Priority: Minor

 Subant call to compile contribs do not pass along parameters from parent 
 build.
 Properties such as hadoop-common.version, asfrepo, offline, etc. are not 
 passed along.
 Result is that build not connected to Internet fails, hdfs proxy refuses to 
 build against own recently built common but rather downloads 0.22-SNAPSHOT 
 from apache again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2211) Build does not pass along properties to contrib builds

2011-07-27 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071999#comment-13071999
 ] 

Joep Rottinghuis commented on HDFS-2211:


Make that HDFS-1935

 Build does not pass along properties to contrib builds
 --

 Key: HDFS-2211
 URL: https://issues.apache.org/jira/browse/HDFS-2211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04
 Sun JRE 1.6
 Ant 1.8.2
Reporter: Joep Rottinghuis
Priority: Minor

 Subant call to compile contribs do not pass along parameters from parent 
 build.
 Properties such as hadoop-common.version, asfrepo, offline, etc. are not 
 passed along.
 Result is that build not connected to Internet fails, hdfs proxy refuses to 
 build against own recently built common but rather downloads 0.22-SNAPSHOT 
 from apache again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2211) Build does not pass along properties to contrib builds

2011-07-27 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071998#comment-13071998
 ] 

Joep Rottinghuis commented on HDFS-2211:


This is related to MAPREDUCE-1935

 Build does not pass along properties to contrib builds
 --

 Key: HDFS-2211
 URL: https://issues.apache.org/jira/browse/HDFS-2211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04
 Sun JRE 1.6
 Ant 1.8.2
Reporter: Joep Rottinghuis
Priority: Minor

 Subant call to compile contribs do not pass along parameters from parent 
 build.
 Properties such as hadoop-common.version, asfrepo, offline, etc. are not 
 passed along.
 Result is that build not connected to Internet fails, hdfs proxy refuses to 
 build against own recently built common but rather downloads 0.22-SNAPSHOT 
 from apache again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1935) Build should not redownload ivy on every invocation

2011-07-27 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072015#comment-13072015
 ] 

Joep Rottinghuis commented on HDFS-1935:


Three files are needed locally to build w/o Inernet connection.

Set the following property
mvn.repo=/home/user

and have three files available:
/home/user/buildrepo/dist/commons/daemon/binaries/1.0.2/linux/commons-daemon-1.0.2-bin-linux-i386.tar.gz
/home/user/buildrepo/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
/home/user/buildrepo/org/apache/maven/maven-ant-tasks/2.0.10/maven-ant-tasks-2.0.10.jar


 Build should not redownload ivy on every invocation
 ---

 Key: HDFS-1935
 URL: https://issues.apache.org/jira/browse/HDFS-1935
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Trivial
  Labels: newbie
 Fix For: 0.22.0

 Attachments: diff, hdfs-1935.patch, hdfs-1935.txt


 Currently we re-download ivy every time we build. If the jar already exists, 
 we should skip this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2211) Build does not pass along properties to contrib builds

2011-07-27 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HDFS-2211:
---

Attachment: HDFS-2211.patch

 Build does not pass along properties to contrib builds
 --

 Key: HDFS-2211
 URL: https://issues.apache.org/jira/browse/HDFS-2211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
 Environment: RHEL 6.1  Ubuntu 11.04
 Sun JRE 1.6
 Ant 1.8.2
Reporter: Joep Rottinghuis
Priority: Minor
 Attachments: HDFS-2211.patch


 Subant call to compile contribs do not pass along parameters from parent 
 build.
 Properties such as hadoop-common.version, asfrepo, offline, etc. are not 
 passed along.
 Result is that build not connected to Internet fails, hdfs proxy refuses to 
 build against own recently built common but rather downloads 0.22-SNAPSHOT 
 from apache again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1935) Build should not redownload ivy on every invocation

2011-07-27 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072137#comment-13072137
 ] 

Joep Rottinghuis commented on HDFS-1935:


With the properties I described and the patch in HDFS-2211 (and without the 
patched attached to this Jira) I can successfully build hdfs on a machine w/o 
Internet connectivity.

 Build should not redownload ivy on every invocation
 ---

 Key: HDFS-1935
 URL: https://issues.apache.org/jira/browse/HDFS-1935
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Trivial
  Labels: newbie
 Fix For: 0.22.0

 Attachments: diff, hdfs-1935.patch, hdfs-1935.txt


 Currently we re-download ivy every time we build. If the jar already exists, 
 we should skip this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira