Re: MiniCluster and provided scope dependencies
On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser josh.el...@gmail.com wrote: I'm curious to hear what people think on this. I'm a really big fan of spinning up a minicluster instance to do some more real testing of software as I write it. With 1.5.0, it's a bit more painful because I have to add a bunch more dependencies to my project (which previously would only have to depend on the accumulo-minicluster artifact). The list includes, but is likely not limited to, commons-io, commons-configuration, hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12. Best as I understand it, the intent of this was that Hadoop will typically provide these artifacts at runtime, and therefore Accumulo doesn't need to re-bundle them itself which I'd agree with (not getting into that whole issue about the Hadoop ecosystem). However, I would think that the minicluster should have non-provided scope dependencies declared on these, as there is no Hadoop installation -- Would this require declaring dependencies on a particular version of hadoop in the minicluster pom? Or could the minicluster pom have profiles for different hadoop versions? I do not know enough about maven to know if you can use profiles declared in a dependency (e.g. if a user depends on minicluster, can they activate profiles in it?) there's just the minicluster. As such, this would alleviate users from having to dig into our dependency management or trialerror to figure out what extra dependencies they have to include in their project to actually make it work Thoughts? - Josh
Re: MiniCluster and provided scope dependencies
On Tue, Sep 24, 2013 at 12:31 PM, Keith Turner ke...@deenlo.com wrote: On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser josh.el...@gmail.com wrote: I'm curious to hear what people think on this. I'm a really big fan of spinning up a minicluster instance to do some more real testing of software as I write it. With 1.5.0, it's a bit more painful because I have to add a bunch more dependencies to my project (which previously would only have to depend on the accumulo-minicluster artifact). The list includes, but is likely not limited to, commons-io, commons-configuration, hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12. Best as I understand it, the intent of this was that Hadoop will typically provide these artifacts at runtime, and therefore Accumulo doesn't need to re-bundle them itself which I'd agree with (not getting into that whole issue about the Hadoop ecosystem). However, I would think that the minicluster should have non-provided scope dependencies declared on these, as there is no Hadoop installation -- Would this require declaring dependencies on a particular version of hadoop in the minicluster pom? Or could the minicluster pom have profiles for different hadoop versions? I do not know enough about maven to know if you can use profiles declared in a dependency (e.g. if a user depends on minicluster, can they activate profiles in it?) The actual dependency in minicluster is against Apache Hadoop but that's besides the point. By marking the hadoop-client dependency as provided that means that Hadoop's dependencies are *not* included at runtime (because hadoop is provided, and, as such, so are its dependencies). In other words, this is completely beside the point of what's actually included in a distribution of Hadoop when you download and install it. Apache Hadoop has dependencies we need to run minicluster. By marking the hadoop-client artifact as 'provided', we do not get its dependencies and the minicluster fails to run. I think this is easy enough to work around by overriding the dependencies we need to run the minicluster in the minicluster module (e.g. make the hadoop-client not 'provided' in the minicluster module). Thus, as we add more things to the minicluster that require other libraries, we control the dependency mgmt instead of forcing that onto the user. there's just the minicluster. As such, this would alleviate users from having to dig into our dependency management or trialerror to figure out what extra dependencies they have to include in their project to actually make it work Thoughts? - Josh
Re: MiniCluster and provided scope dependencies
+1 I remember kind of having this discussion in June because I wanted to be able to run the minicluster as a single node accumulo using the start package. I like this approach better. 1.6.0 provides a main method for firing up the minicluster and having the dependencies in the pom will allow developers to fire it up without needing Hadoop/Zookeeper installed. ACCUMULO-1405 https://issues.apache.org/jira/browse/ACCUMULO-1405 On Tue, Sep 24, 2013 at 12:48 PM, Josh Elser josh.el...@gmail.com wrote: On Tue, Sep 24, 2013 at 12:31 PM, Keith Turner ke...@deenlo.com wrote: On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser josh.el...@gmail.com wrote: I'm curious to hear what people think on this. I'm a really big fan of spinning up a minicluster instance to do some more real testing of software as I write it. With 1.5.0, it's a bit more painful because I have to add a bunch more dependencies to my project (which previously would only have to depend on the accumulo-minicluster artifact). The list includes, but is likely not limited to, commons-io, commons-configuration, hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12. Best as I understand it, the intent of this was that Hadoop will typically provide these artifacts at runtime, and therefore Accumulo doesn't need to re-bundle them itself which I'd agree with (not getting into that whole issue about the Hadoop ecosystem). However, I would think that the minicluster should have non-provided scope dependencies declared on these, as there is no Hadoop installation -- Would this require declaring dependencies on a particular version of hadoop in the minicluster pom? Or could the minicluster pom have profiles for different hadoop versions? I do not know enough about maven to know if you can use profiles declared in a dependency (e.g. if a user depends on minicluster, can they activate profiles in it?) The actual dependency in minicluster is against Apache Hadoop but that's besides the point. By marking the hadoop-client dependency as provided that means that Hadoop's dependencies are *not* included at runtime (because hadoop is provided, and, as such, so are its dependencies). In other words, this is completely beside the point of what's actually included in a distribution of Hadoop when you download and install it. Apache Hadoop has dependencies we need to run minicluster. By marking the hadoop-client artifact as 'provided', we do not get its dependencies and the minicluster fails to run. I think this is easy enough to work around by overriding the dependencies we need to run the minicluster in the minicluster module (e.g. make the hadoop-client not 'provided' in the minicluster module). Thus, as we add more things to the minicluster that require other libraries, we control the dependency mgmt instead of forcing that onto the user. there's just the minicluster. As such, this would alleviate users from having to dig into our dependency management or trialerror to figure out what extra dependencies they have to include in their project to actually make it work Thoughts? - Josh
Re: MiniCluster and provided scope dependencies
On Tue, Sep 24, 2013 at 12:48 PM, Josh Elser josh.el...@gmail.com wrote: On Tue, Sep 24, 2013 at 12:31 PM, Keith Turner ke...@deenlo.com wrote: On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser josh.el...@gmail.com wrote: I'm curious to hear what people think on this. I'm a really big fan of spinning up a minicluster instance to do some more real testing of software as I write it. With 1.5.0, it's a bit more painful because I have to add a bunch more dependencies to my project (which previously would only have to depend on the accumulo-minicluster artifact). The list includes, but is likely not limited to, commons-io, commons-configuration, hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12. Best as I understand it, the intent of this was that Hadoop will typically provide these artifacts at runtime, and therefore Accumulo doesn't need to re-bundle them itself which I'd agree with (not getting into that whole issue about the Hadoop ecosystem). However, I would think that the minicluster should have non-provided scope dependencies declared on these, as there is no Hadoop installation -- Would this require declaring dependencies on a particular version of hadoop in the minicluster pom? Or could the minicluster pom have profiles for different hadoop versions? I do not know enough about maven to know if you can use profiles declared in a dependency (e.g. if a user depends on minicluster, can they activate profiles in it?) The actual dependency in minicluster is against Apache Hadoop but that's besides the point. By marking the hadoop-client dependency as provided that means that Hadoop's dependencies are *not* included at runtime (because hadoop is provided, and, as such, so are its dependencies). In other words, this is completely beside the point of what's actually included in a distribution of Hadoop when you download and install it. Apache Hadoop has dependencies we need to run minicluster. By marking the hadoop-client artifact as 'provided', we do not get its dependencies and the minicluster fails to run. I think this is easy enough to work around by overriding the dependencies we need to run the minicluster in the minicluster module (e.g. make the hadoop-client not 'provided' in the minicluster module). Thus, as we add more things So if we mark hadoop-client as not provided, then we have to choose a version? How easy will it be for a user to choose a different version of hadoop for their testing? I am trying to undertand what impact this would have a users pom that depends on Hadoop 2 if minicluster depends on Hadoop 1.2. to the minicluster that require other libraries, we control the dependency mgmt instead of forcing that onto the user. there's just the minicluster. As such, this would alleviate users from having to dig into our dependency management or trialerror to figure out what extra dependencies they have to include in their project to actually make it work Thoughts? - Josh
Re: MiniCluster and provided scope dependencies
Oh, I see your point now. For hadoop 1 over hadoop 2 we would just use the same profiles that we have in place. We could look into using a classifier when deploying these artifacts so users can pull down a version of minicluster that is compatible with hadoop2 without forcing them to build it themselves. Given that we already *have* hadoop-1.x listed as the default dependency, I don't really see that as being an issue. On Tue, Sep 24, 2013 at 12:58 PM, Keith Turner ke...@deenlo.com wrote: On Tue, Sep 24, 2013 at 12:48 PM, Josh Elser josh.el...@gmail.com wrote: On Tue, Sep 24, 2013 at 12:31 PM, Keith Turner ke...@deenlo.com wrote: On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser josh.el...@gmail.com wrote: I'm curious to hear what people think on this. I'm a really big fan of spinning up a minicluster instance to do some more real testing of software as I write it. With 1.5.0, it's a bit more painful because I have to add a bunch more dependencies to my project (which previously would only have to depend on the accumulo-minicluster artifact). The list includes, but is likely not limited to, commons-io, commons-configuration, hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12. Best as I understand it, the intent of this was that Hadoop will typically provide these artifacts at runtime, and therefore Accumulo doesn't need to re-bundle them itself which I'd agree with (not getting into that whole issue about the Hadoop ecosystem). However, I would think that the minicluster should have non-provided scope dependencies declared on these, as there is no Hadoop installation -- Would this require declaring dependencies on a particular version of hadoop in the minicluster pom? Or could the minicluster pom have profiles for different hadoop versions? I do not know enough about maven to know if you can use profiles declared in a dependency (e.g. if a user depends on minicluster, can they activate profiles in it?) The actual dependency in minicluster is against Apache Hadoop but that's besides the point. By marking the hadoop-client dependency as provided that means that Hadoop's dependencies are *not* included at runtime (because hadoop is provided, and, as such, so are its dependencies). In other words, this is completely beside the point of what's actually included in a distribution of Hadoop when you download and install it. Apache Hadoop has dependencies we need to run minicluster. By marking the hadoop-client artifact as 'provided', we do not get its dependencies and the minicluster fails to run. I think this is easy enough to work around by overriding the dependencies we need to run the minicluster in the minicluster module (e.g. make the hadoop-client not 'provided' in the minicluster module). Thus, as we add more things So if we mark hadoop-client as not provided, then we have to choose a version? How easy will it be for a user to choose a different version of hadoop for their testing? I am trying to undertand what impact this would have a users pom that depends on Hadoop 2 if minicluster depends on Hadoop 1.2. to the minicluster that require other libraries, we control the dependency mgmt instead of forcing that onto the user. there's just the minicluster. As such, this would alleviate users from having to dig into our dependency management or trialerror to figure out what extra dependencies they have to include in their project to actually make it work Thoughts? - Josh
Re: MiniCluster and provided scope dependencies
Hadoop 2 actually has a specific mini cluster component/pom, which is what you get to depend on there. It's dependencies aer still going to be a mess until someone sits down to fix HADOOP-9991; but at least now you can be slightly more selective On 24 September 2013 18:20, Josh Elser josh.el...@gmail.com wrote: Oh, I see your point now. For hadoop 1 over hadoop 2 we would just use the same profiles that we have in place. We could look into using a classifier when deploying these artifacts so users can pull down a version of minicluster that is compatible with hadoop2 without forcing them to build it themselves. Given that we already *have* hadoop-1.x listed as the default dependency, I don't really see that as being an issue. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: MiniCluster and provided scope dependencies
Steve, Thanks for the insight -- I think I may have caused some confused. Previously, I was only referring to the Accumulo minicluster module. We have some profiles in Accumulo for hadoop1 and hadoop2 which attempt to insulate us from upstream dependency fun. In the released versions of the Accumulo minicluster, I believe we only have support to use the local fs (no Hadoop minicluster). I think in Accumulo-1.6 we have support for our Accumulo minicluster to use the Hadoop minicluster. This is all good information to have as we figure out what's best. The input is appreciated! On Tue, Sep 24, 2013 at 1:55 PM, Steve Loughran ste...@hortonworks.com wrote: Hadoop 2 actually has a specific mini cluster component/pom, which is what you get to depend on there. It's dependencies aer still going to be a mess until someone sits down to fix HADOOP-9991; but at least now you can be slightly more selective On 24 September 2013 18:20, Josh Elser josh.el...@gmail.com wrote: Oh, I see your point now. For hadoop 1 over hadoop 2 we would just use the same profiles that we have in place. We could look into using a classifier when deploying these artifacts so users can pull down a version of minicluster that is compatible with hadoop2 without forcing them to build it themselves. Given that we already *have* hadoop-1.x listed as the default dependency, I don't really see that as being an issue. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: MiniCluster and provided scope dependencies
MiniAccumuloCluster has the option of starting HDFS, since LocalFS doesn't support flush and all the WAL tests fail on it. -Eric On Tue, Sep 24, 2013 at 2:14 PM, Josh Elser josh.el...@gmail.com wrote: Steve, Thanks for the insight -- I think I may have caused some confused. Previously, I was only referring to the Accumulo minicluster module. We have some profiles in Accumulo for hadoop1 and hadoop2 which attempt to insulate us from upstream dependency fun. In the released versions of the Accumulo minicluster, I believe we only have support to use the local fs (no Hadoop minicluster). I think in Accumulo-1.6 we have support for our Accumulo minicluster to use the Hadoop minicluster. This is all good information to have as we figure out what's best. The input is appreciated! On Tue, Sep 24, 2013 at 1:55 PM, Steve Loughran ste...@hortonworks.com wrote: Hadoop 2 actually has a specific mini cluster component/pom, which is what you get to depend on there. It's dependencies aer still going to be a mess until someone sits down to fix HADOOP-9991; but at least now you can be slightly more selective On 24 September 2013 18:20, Josh Elser josh.el...@gmail.com wrote: Oh, I see your point now. For hadoop 1 over hadoop 2 we would just use the same profiles that we have in place. We could look into using a classifier when deploying these artifacts so users can pull down a version of minicluster that is compatible with hadoop2 without forcing them to build it themselves. Given that we already *have* hadoop-1.x listed as the default dependency, I don't really see that as being an issue. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: MiniCluster and provided scope dependencies
being-lazy: do we have one that encompasses this issue already? Is there a good parent for me to piggy-back on to? On Tue, Sep 24, 2013 at 2:20 PM, Christopher ctubb...@apache.org wrote: I agree. The provided stuff was done mainly to drive our packaging in 1.5, not to cater to maven developers. There are a few open tickets about this for 1.6. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser josh.el...@gmail.com wrote: I'm curious to hear what people think on this. I'm a really big fan of spinning up a minicluster instance to do some more real testing of software as I write it. With 1.5.0, it's a bit more painful because I have to add a bunch more dependencies to my project (which previously would only have to depend on the accumulo-minicluster artifact). The list includes, but is likely not limited to, commons-io, commons-configuration, hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12. Best as I understand it, the intent of this was that Hadoop will typically provide these artifacts at runtime, and therefore Accumulo doesn't need to re-bundle them itself which I'd agree with (not getting into that whole issue about the Hadoop ecosystem). However, I would think that the minicluster should have non-provided scope dependencies declared on these, as there is no Hadoop installation -- there's just the minicluster. As such, this would alleviate users from having to dig into our dependency management or trialerror to figure out what extra dependencies they have to include in their project to actually make it work Thoughts? - Josh
Re: MiniCluster and provided scope dependencies
I don't think we should do that. Artifacts shouldn't be deployed multiple times with different POMs for different dependencies. (I'm 100% positive we'd get a scolding from Benson for that.) The point of MAC is to test Accumulo, not Hadoop, and the additional classifiers adds a lot of complexity to the build. I think some of this could be improved via the accumulo-maven-plugin. You can manipulate plugin dependencies easily enough in Maven right now, and it would be trivial for users to override the a-m-p dependency on hadoop-client. (http://blog.sonatype.com/people/2008/04/how-to-override-a-plugins-dependency-in-maven/) -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, Sep 24, 2013 at 1:20 PM, Josh Elser josh.el...@gmail.com wrote: Oh, I see your point now. For hadoop 1 over hadoop 2 we would just use the same profiles that we have in place. We could look into using a classifier when deploying these artifacts so users can pull down a version of minicluster that is compatible with hadoop2 without forcing them to build it themselves. Given that we already *have* hadoop-1.x listed as the default dependency, I don't really see that as being an issue. On Tue, Sep 24, 2013 at 12:58 PM, Keith Turner ke...@deenlo.com wrote: On Tue, Sep 24, 2013 at 12:48 PM, Josh Elser josh.el...@gmail.com wrote: On Tue, Sep 24, 2013 at 12:31 PM, Keith Turner ke...@deenlo.com wrote: On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser josh.el...@gmail.com wrote: I'm curious to hear what people think on this. I'm a really big fan of spinning up a minicluster instance to do some more real testing of software as I write it. With 1.5.0, it's a bit more painful because I have to add a bunch more dependencies to my project (which previously would only have to depend on the accumulo-minicluster artifact). The list includes, but is likely not limited to, commons-io, commons-configuration, hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12. Best as I understand it, the intent of this was that Hadoop will typically provide these artifacts at runtime, and therefore Accumulo doesn't need to re-bundle them itself which I'd agree with (not getting into that whole issue about the Hadoop ecosystem). However, I would think that the minicluster should have non-provided scope dependencies declared on these, as there is no Hadoop installation -- Would this require declaring dependencies on a particular version of hadoop in the minicluster pom? Or could the minicluster pom have profiles for different hadoop versions? I do not know enough about maven to know if you can use profiles declared in a dependency (e.g. if a user depends on minicluster, can they activate profiles in it?) The actual dependency in minicluster is against Apache Hadoop but that's besides the point. By marking the hadoop-client dependency as provided that means that Hadoop's dependencies are *not* included at runtime (because hadoop is provided, and, as such, so are its dependencies). In other words, this is completely beside the point of what's actually included in a distribution of Hadoop when you download and install it. Apache Hadoop has dependencies we need to run minicluster. By marking the hadoop-client artifact as 'provided', we do not get its dependencies and the minicluster fails to run. I think this is easy enough to work around by overriding the dependencies we need to run the minicluster in the minicluster module (e.g. make the hadoop-client not 'provided' in the minicluster module). Thus, as we add more things So if we mark hadoop-client as not provided, then we have to choose a version? How easy will it be for a user to choose a different version of hadoop for their testing? I am trying to undertand what impact this would have a users pom that depends on Hadoop 2 if minicluster depends on Hadoop 1.2. to the minicluster that require other libraries, we control the dependency mgmt instead of forcing that onto the user. there's just the minicluster. As such, this would alleviate users from having to dig into our dependency management or trialerror to figure out what extra dependencies they have to include in their project to actually make it work Thoughts? - Josh
Re: MiniCluster and provided scope dependencies
I agree. The provided stuff was done mainly to drive our packaging in 1.5, not to cater to maven developers. There are a few open tickets about this for 1.6. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser josh.el...@gmail.com wrote: I'm curious to hear what people think on this. I'm a really big fan of spinning up a minicluster instance to do some more real testing of software as I write it. With 1.5.0, it's a bit more painful because I have to add a bunch more dependencies to my project (which previously would only have to depend on the accumulo-minicluster artifact). The list includes, but is likely not limited to, commons-io, commons-configuration, hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12. Best as I understand it, the intent of this was that Hadoop will typically provide these artifacts at runtime, and therefore Accumulo doesn't need to re-bundle them itself which I'd agree with (not getting into that whole issue about the Hadoop ecosystem). However, I would think that the minicluster should have non-provided scope dependencies declared on these, as there is no Hadoop installation -- there's just the minicluster. As such, this would alleviate users from having to dig into our dependency management or trialerror to figure out what extra dependencies they have to include in their project to actually make it work Thoughts? - Josh