from:"Gour Saha"

My Apache issues account

2014-06-30 Thread Gour Saha

Team,

Here is my Apache issues username - gsaha

Please assign me any newbie JIRAs.

-Gour

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: excluding build output from app package

2014-07-08 Thread Gour Saha

A quick and temporary way to ignore the rat checks for a mvn run is to
add -Drat.ignoreErrors to the command line. But I agree with Sumit that for
the long term we should add on to the exclusion list.

-Gour


On Tue, Jul 8, 2014 at 10:59 AM, Jon Maron  wrote:

> The pom (root dir) specifies a list of exclusions:
>
>   
> org.apache.rat
> apache-rat-plugin
> ${apache-rat-plugin.version}
> 
>   
> check-licenses
> 
>   check
> 
>   
> 
> 
>   
> **/*.json
> **/*.tar
> **/build.properties
> **/regionservers
> **/slaves
> **/httpfs-signature.secret
> **/dfs.exclude
> **/*.iml
> **/rat.txt
> DISCLAIMER
>   
> 
>   
>
>   Try adding your set (e.g. ‘**/TEST*.xml’)
>
> On Jul 8, 2014, at 1:53 PM, Ted Yu  wrote:
>
> > Hi,
> > Once I produce app package for hbase, the next build would complain about
> > the following.
> >
> > Unapproved licenses:
> >
> >  app-packages/hbase/target/archive-tmp/appConfig.json.1746534752.filtered
> >  app-packages/hbase/target/archive-tmp/appConfig.json.727813455.filtered
> >
> >
> app-packages/hbase/target/maven-status/maven-compiler-plugin/testCompile/default-testCompile/createdFiles.lst
> >
> >
> app-packages/hbase/target/maven-status/maven-compiler-plugin/testCompile/default-testCompile/inputFiles.lst
> >
> >
> app-packages/hbase/target/failsafe-reports/TEST-org.apache.slider.funtest.hbase.HBaseMonitorSSLIT.xml
> >
> >
> app-packages/hbase/target/failsafe-reports/TEST-org.apache.slider.funtest.hbase.HBaseBasicIT.xml
> >
> >
> app-packages/hbase/target/failsafe-reports/org.apache.slider.funtest.hbase.HBaseMonitorSSLIT.txt
> >
> >
> app-packages/hbase/target/failsafe-reports/org.apache.slider.funtest.hbase.HBaseBasicIT.txt
> >  app-packages/hbase/target/failsafe-reports/failsafe-summary.xml
> >
> > Suggestion on how these build output should be excluded from rat check is
> > welcome.
> >
> > Thanks
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: HBase functional tests during "mvn test"

2014-07-09 Thread Gour Saha

Sumit,

This is how jenkins is running it (
https://builds.apache.org/job/Slider-develop/84/consoleText) -

[Slider-develop] $ /home/hudson/tools/maven/apache-maven-3.0.4/bin/mvn
clean 
-Dslider.conf.dir=/home/jenkins/jenkins-slave/workspace/Slider-develop/src/test/clusters/offline/slider/
package

It is providing the slider.conf.dir on the command line.

-Gour



On Wed, Jul 9, 2014 at 12:44 PM, Sumit Mohanty 
wrote:

> So it seems that the issue is due to static initializer
>
>public static final String SLIDER_CONF_DIR =
> sysprop(SLIDER_CONF_DIR_PROP)
>
> and when running "mvn test" we do not provide "slider.conf.dir" thus the
> code complains.
>
> Presumably, it was working fine as jenkins had good build as recent as 2
> days back.
>
> Do you know if there is a obvious fix, before I investigate further?
>
> -Sumit
>
>
> On Wed, Jul 9, 2014 at 12:09 PM, Jon Maron  wrote:
>
> > I am seeing the same behavior…
> >
> > On Jul 9, 2014, at 2:51 PM, Sumit Mohanty 
> > wrote:
> >
> > > This is on a clean VM.
> > >
> > > When I run "mvn test" I see these failures. I assume the
> > >
> > > Slider HBase Provider Functional Tests
> > > Slider Accumulo Provider Functional Tests
> > >
> > > should get skipped. Am I not issuing the correct command or there is a
> > bug?
> > >
> > > -Sumit
> > >
> > > Tests in error:
> > >
> > >
> >
> TestFunctionalHBaseCluster.org.apache.slider.providers.hbase.funtest.TestFunctionalHBaseCluster
> > > » ExceptionInInitializer
> > >
> > >
> >
> TestHBaseClusterBuildDestroy.org.apache.slider.providers.hbase.funtest.TestHBaseClusterBuildDestroy
> > > » ExceptionInInitializer
> > >
> > >
> >
> TestHBaseClusterBuildDestroy.org.apache.slider.providers.hbase.funtest.TestHBaseClusterBuildDestroy
> > > » NoClassDefFound
> > >
> > >
> >
> TestHBaseClusterLifecycle.org.apache.slider.providers.hbase.funtest.TestHBaseClusterLifecycle
> > > » ExceptionInInitializer
> > >
> > >
> >
> TestHBaseIntegration.org.apache.slider.providers.hbase.funtest.TestHBaseIntegration
> > > » ExceptionInInitializer
> > >  TestHBaseLoad.org.apache.slider.providers.hbase.funtest.TestHBaseLoad
> »
> > > ExceptionInInitializer
> > >
> > >
> >
> TestHBaseNodeFailure.org.apache.slider.providers.hbase.funtest.TestHBaseNodeFailure
> > > » ExceptionInInitializer
> > >  TestImages.org.apache.slider.providers.hbase.funtest.TestImages »
> > > ExceptionInInitializer
> > >
> > > Tests run: 20, Failures: 9, Errors: 8, Skipped: 0
> > >
> > > [INFO]
> > >
> 
> > > [INFO] Reactor Summary:
> > > [INFO]
> > > [INFO] Slider  SUCCESS
> > [1.196s]
> > > [INFO] Command Logger  SUCCESS
> > [0.263s]
> > > [INFO] Slider Command Logger App Package . SUCCESS
> > [0.594s]
> > > [INFO] Slider Core ... SUCCESS
> > > [19:22.800s]
> > > [INFO] Slider Agent .. SUCCESS
> > [5.984s]
> > > [INFO] Slider Assembly ... SUCCESS
> > [0.212s]
> > > [INFO] Slider Functional Tests ... SUCCESS
> > [1.206s]
> > > [INFO] Slider Accumulo App Package ... SUCCESS
> > [0.839s]
> > > [INFO] Slider HBase Provider . SUCCESS
> > [49.557s]
> > > [INFO] Slider HBase Provider Functional Tests  FAILURE
> > [7.598s]
> > > [INFO] Slider Accumulo Provider .. SKIPPED
> > > [INFO] Slider Accumulo Provider Functional Tests . SKIPPED
> > > [INFO] Slider Install  SKIPPED
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
> --
> CONFIDENTIALITY

Re: [VOTE] Apache Slider Incubating Release 0.40.0 RC0

2014-07-11 Thread Gour Saha

+1


On Fri, Jul 11, 2014 at 12:54 PM, Jon Maron  wrote:

> +1
>
> On Jul 11, 2014, at 2:42 AM, Sumit Mohanty 
> wrote:
>
> > Hello folks,
> >
> > This is a call for a vote on Apache Slider 0.40.0 incubating release.
> > Thanks to everyone who have contributed to this release.
> >
> > Git source tag:
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=shortlog;h=refs/tags/release-0.40.0-rc0
> >
> > Staging site:
> > http://people.apache.org/~smohanty/slider-release-0.40.0-rc0
> >
> > PGP release keys (signed using 791FDAB0)
> > http://pgp.mit.edu/pks/lookup?op=vindex&search=0xECFC8276791FDAB0
> >
> > One can look into the issues fixed in this release at:
> > https://issues.apache.org/jira/browse/SLIDER/fixforversion/12326825
> >
> > Note that this is a source only release and we are voting on the source.
> > (One .tar file, appdef_1.tar, is a text file used for -ve testing)
> >
> > Build instructions at:
> > http://slider.incubator.apache.org/developing/building.html
> >
> > Vote will be open for 72 hours
> >
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > +1 from me to start.
> >
> > thanks
> > Sumit
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [VOTE] Apache Slider Incubating Release 0.40.0 RC0

2014-07-14 Thread Gour Saha

Just to expand on my +1 vote, I did all the following -

- successfully downloaded tarball
- verified md5, pgp, sha signatures
- followed the steps in the "Getting Started" guide
http://slider.incubator.apache.org/docs/getting_started.html and
successfully did all the following
  * build slider package
  * ran all tests successfully
  * created HBase application package
  * made config changes as per the guide, created a cluster and
successfully started the HBase application using Slider

-Gour


On Sat, Jul 12, 2014 at 4:16 PM, Josh Elser  wrote:

> +1
>
> * Verified sigs
> * Built from source archive (noted successful RAT check)
> * Successfully ran unit tests from source archive (sans tests that
> required tarballs)
> * Built from git tag
> * Successfully ran core and hbase provider tests (Accumulo provider tests
> failed due to missing hdfs resources on the classpath -- created SLIDER-230)
>
> Will try to poke around some more tonight and/or tomorrow, too.
>
>
> On 7/11/14, 2:42 AM, Sumit Mohanty wrote:
>
>> Hello folks,
>>
>> This is a call for a vote on Apache Slider 0.40.0 incubating release.
>> Thanks to everyone who have contributed to this release.
>>
>> Git source tag:
>> https://git-wip-us.apache.org/repos/asf?p=incubator-slider.
>> git;a=shortlog;h=refs/tags/release-0.40.0-rc0
>>
>> Staging site:
>> http://people.apache.org/~smohanty/slider-release-0.40.0-rc0
>>
>> PGP release keys (signed using 791FDAB0)
>> http://pgp.mit.edu/pks/lookup?op=vindex&search=0xECFC8276791FDAB0
>>
>> One can look into the issues fixed in this release at:
>> https://issues.apache.org/jira/browse/SLIDER/fixforversion/12326825
>>
>> Note that this is a source only release and we are voting on the source.
>> (One .tar file, appdef_1.tar, is a text file used for -ve testing)
>>
>> Build instructions at:
>> http://slider.incubator.apache.org/developing/building.html
>>
>> Vote will be open for 72 hours
>>
>> [ ] +1 approve
>> [ ] +0 no opinion
>> [ ] -1 disapprove (and reason why)
>>
>> +1 from me to start.
>>
>> thanks
>> Sumit
>>
>>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Can you look at the updated site for 0.40?

2014-07-22 Thread Gour Saha

Looks good to me.

-Gour


On Tue, Jul 22, 2014 at 11:22 AM, Sumit Mohanty  wrote:

> http://slider.incubator.apache.org/
>
> Specifically, the download links and texts around the latest release.
> I will send an announcement in few hours.
>
>
> The steps needed to release the bits are at:
> http://slider.incubator.apache.org/developing/releasing.html
>
> Its a bit pedantic but I think it is easier to remove steps once the next
> person to release goes over it. IOW, feel free to update the page as
> needed.
>
>
> thanks
> -Sumit
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Slider and Hadoop versions

2014-08-05 Thread Gour Saha

I had to slap Hadoop-yarn 2.4.1 libraries on top of slider 0.40 to get the
AM restart feature working. It's kind of a hack, so +1 for the plan to move
slider to hadoop 2.5.

-Gour


On Tue, Aug 5, 2014 at 6:41 AM, Jon Maron  wrote:

> Sounds liike a viable plan.  Do we know the hadoop version that is planned
> for Champlain?
>
> — Jon
>
> On Aug 5, 2014, at 5:31 AM, Steve Loughran  wrote:
>
> > A quick update on Hadoop versions
> >
> > Slider 0.40: works on Hadoop 2.4.0+; but AM restart doesn't work properly
> > on Hadoop 2.4
> >
> > Hadoop 2.5 is about to RC; I want to do a couple of tests there as part
> of
> > the Hadoop release process -Slider is the defacto AM failure/restart
> > functional test, so we get to verify it works.
> >
> > Hadoop 2.5 is also in sync with artifacts  we need. (especially jackson,
> > https://issues.apache.org/jira/browse/HADOOP-10104 ) -this syncs it up
> with
> > Curator. FWIW that update broke Tez as they were uploading the jackson
> JARs
> > themselves ). Tez worked because their Jars were the Hadoop 2.4 versions,
> > and now they are unhappy about the change (
> > https://issues.apache.org/jira/browse/YARN-2092 ). My stance is that we
> > can't lock down Hadoop dependencies for all YARn apps, so unless we do
> > better isolation/dcache all of us YARN-app authors who want to use the
> > classpath of the target machine will have to deal with that.
> >
> > Hadoop 2.6 is the next Hadoop release after. This has some things we'd
> like
> > -Credential Provider: https://issues.apache.org/jira/browse/HADOOP-10607
> > -hopefully, YARN service registry (
> > https://issues.apache.org/jira/browse/YARN-913). I have a proto registry
> > design/imple there which I want to stick up for comments by the end of
> the
> > week. There'a a branch in features/ which uses this...I'm using slider
> as a
> > way of seeing if the API is usable
> > -Java 7 (once I push in that patch)
> >
> > Once Hadoop 2.5 is out I plan to switch to Hadoop 2.6-SNAPSHOT as the
> base
> > version for slider dev.. good for Slider and Hadoop itself (we test the
> > features), but it would mean that releases after that point won't even
> work
> > on Hadoop 2.5
> >
> > I'm going to propose, therefore:
> >
> > 1. We cut a Slider 0.50 release to immediately follow the Hadoop 2.5
> > release, as the one-and-only Hadoop 2.5+ version; Slider 0.40 will be the
> > last Hadoop 2.4 release. There's no actual source difference between
> them,
> > we'd just build, test and release against the 2.5 JARs. I already test
> > against branch-2 on Java 8 on one VM, so don't expect surprises.
> >
> > 2. After that release, we switch develop/ to build against Hadoop
> branch-2
> > only. Either developers build locally, or they do builds against the
> apache
> > snapshot repository
> >
> > Thoughts?
> >
> > -steve
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [VOTE] Apache Slider Incubating Release 0.50.2-incubating RC0

2014-08-25 Thread Gour Saha

+ 1

- verified checksums and signatures
- all unit tests passed
- all 35 fun tests passed

-Gour


On Mon, Aug 25, 2014 at 10:48 AM, Billie Rinaldi 
wrote:

> +1
>
> signatures and checksums are good
> licenses are good
> contains only source, matches the git tag
> ran all unit tests (except the hbase ones) and some funtests
>
>
> On Fri, Aug 22, 2014 at 5:31 AM, Steve Loughran 
> wrote:
>
> > Hello folks,
> >
> > This is a call for a vote on Apache Slider 0.50.2-incubating release.
> >
> > This is a rebuild of the 0.50.1-incubating release with the following
> > changes:
> >
> > * SLIDER-331: get-hbase.sh has the ASF header
> > * .zip file included alongside .tar.gz
> > * my public key is in both the KEYS file and on the MIT PGP server
> >
> > Git source tag:
> > ttps://
> >
> >
> git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=shortlog;h=refs/tags/release-0.50.2-incubating-rc0
> >
> > Staging site:
> > http://people.apache.org/~stevel/slider-release-0.50.2-incubating-rc0
> >
> > PGP release keys (signed using ste...@apache.org)
> > http://pgp.mit.edu:11371/pks/lookup?op=vindex&search=ste...@apache.org
> >
> > One can look into the issues fixed in this release at:
> > https://issues.apache.org/jira/browse/SLIDER/fixforversion/12326949/
> >
> > Note that this is a source only release and we are voting on the source.
> > (One .tar file is a text file used for -ve testing)
> >
> > Build instructions at:
> > http://slider.incubator.apache.org/developing/building.html
> >
> > Vote will be open for 72 hours
> >
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> >
> > -steve
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Slider-client properties when RM HA is enabled

2014-09-16 Thread Gour Saha

I think the following are also required -

yarn.resourcemanager.recovery.enabled
yarn.resourcemanager.store.class


-Gour

On Tue, Sep 16, 2014 at 6:01 PM, Sumit Mohanty 
wrote:

> Ran some tests. Looks like the easiest option is to drop yarn-site.xml in
> Slider conf dir.
>
> However, if hadoop is installed on the host and /etc/hadoop/conf exists,
> should we just use the site.xml files at /etc/hadoop/conf?
>
> -Sumit
>
> On Tue, Sep 16, 2014 at 5:45 PM, Sumit Mohanty 
> wrote:
>
> > Looks like we need the following properties in the slider-client when RM
> HA
> > is enabled.
> >
> > * yarn.resourcemanager.ha.enabled
> > * yarn.resourcemanager.ha.rm-ids
> > * yarn.resourcemanager.hostname.[one entry per ids specified above]
> > * yarn.resourcemanager.zk-address
> > * yarn.resourcemanager.cluster-id
> >
> > Does the list look complete?
> >
> > *Steve*, related question:
> > Should we go the route where we drop yarn-site.xml in the Slider config
> > directory and it gets read by the client? *Or, does it already work?*
> >
> > *-*Sumit
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
>
>
> --
> thanks
> Sumit
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Slider-client properties when RM HA is enabled

2014-09-18 Thread Gour Saha

Here is a comprehensive list of all the properties I had to add to
slider-client.xml to make Slider Client work against an RM HA enabled
cluster.

Note: I had a 3-node vagrant cluster c6401, c6402 and c6403. The thing that
I was not sure of is that I had to continue to specify the property
"yarn.resourcemanager.address". When my cluster was non-HA it's value was
set to "c6402.ambari.apache.org:8050" and I left it as is.

Removing that entry threw the following exception:

2014-09-18 00:19:25,760 [main] ERROR main.ServiceLauncher - No valid
Resource Manager address provided in the argument --manager or the
configuration property yarn.resourcemanager.address value :
0.0.0.0/0.0.0.0:8032

*Here are all the RM HA related properties -*


  yarn.resourcemanager.cluster-id
  yarn-cluster



  yarn.resourcemanager.ha.enabled
  true



  yarn.resourcemanager.ha.rm-ids
  rm1,rm2



  yarn.resourcemanager.hostname.rm1
  c6402.ambari.apache.org



  yarn.resourcemanager.hostname.rm2
  c6401.ambari.apache.org



  yarn.resourcemanager.recovery.enabled
  true



  yarn.resourcemanager.store.class

org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore



  yarn.resourcemanager.ha.automatic-failover.zk-base-path
  /yarn-leader-election



  yarn.resourcemanager.zk-address
  c6401.ambari.apache.org,c6402.ambari.apache.org,
c6403.ambari.apache.org



-Gour


On Wed, Sep 17, 2014 at 7:53 AM, Jon Maron  wrote:

> How do we know we have a valid hadoop conf on the client?  Should we
> augment client error messages/info messages to indicate where properties
> are being found?
>
> On Wed, Sep 17, 2014 at 10:28 AM, Billie Rinaldi  >
> wrote:
>
> > That would be great if we could drop the hadoop conf properties from the
> > slider-client.xml when we have a good hadoop conf available.
> >
> > On Wed, Sep 17, 2014 at 2:19 AM, Steve Loughran 
> > wrote:
> >
> > > I think there's some JIRAs on that. What we could do is add the hadoop
> > conf
> > > dir env variables to the classpath (put them after slider-conf dir to
> > pick
> > > up slider's log4j first).
> > >
> > > I'd argue for only doing that in the .py script and then we change the
> > > build to copy bin/slider.py to bin/slider (somehow) so there is then
> only
> > > one script to keep up to date. And as the script gets more complex,
> > > switching to python gets more and more compelling
> > >
> > >
> > > On 17 September 2014 02:01, Sumit Mohanty 
> > wrote:
> > >
> > > > Ran some tests. Looks like the easiest option is to drop
> yarn-site.xml
> > in
> > > > Slider conf dir.
> > > >
> > > > However, if hadoop is installed on the host and /etc/hadoop/conf
> > exists,
> > > > should we just use the site.xml files at /etc/hadoop/conf?
> > > >
> > > > -Sumit
> > > >
> > > > On Tue, Sep 16, 2014 at 5:45 PM, Sumit Mohanty <
> > smoha...@hortonworks.com
> > > >
> > > > wrote:
> > > >
> > > > > Looks like we need the following properties in the slider-client
> when
> > > RM
> > > > HA
> > > > > is enabled.
> > > > >
> > > > > * yarn.resourcemanager.ha.enabled
> > > > > * yarn.resourcemanager.ha.rm-ids
> > > > > * yarn.resourcemanager.hostname.[one entry per ids specified above]
> > > > > * yarn.resourcemanager.zk-address
> > > > > * yarn.resourcemanager.cluster-id
> > > > >
> > > > > Does the list look complete?
> > > > >
> > > > > *Steve*, related question:
> > > > > Should we go the route where we drop yarn-site.xml in the Slider
> > config
> > > > > directory and it gets read by the client? *Or, does it already
> work?*
> > > > >
> > > > > *-*Sumit
> > > > >
> > > > > --
> > > > > CONFIDENTIALITY NOTICE
> > > > > NOTICE: This message is intended for the use of the individual or
> > > entity
> > > > to
> > > > > which it is addressed and may contain information that is
> > confidential,
> > > > > privileged and exempt from disclosure under applicable law. If the
> > > reader
> > > > > of this message is not the intended recipient, you are hereby
> > notified
> > > > that
> > > > > any printing, copying, dissemination, distribution, disclosure or
> > > > > forwarding of this communication is strictly prohibited. If you
> have
> > > > > received this communication in error, please contact the sender
> > > > immediately
> > > > > and delete it from your system. Thank You.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > thanks
> > > > Sumit
> > > >
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication

Re: new committer

2014-09-27 Thread Gour Saha

Thank you everyone for the warm welcome. Looking forward to continue to add
value to the project.

-Gour


On Sat, Sep 27, 2014 at 7:58 AM, Sumit Mohanty 
wrote:

> Congratulations Gour.
>
> On Sat, Sep 27, 2014 at 7:35 AM, Jon Maron  wrote:
>
> > Welcome!
> >
> > Going Mobile
> >
> >
> > > On Sep 27, 2014, at 9:13 AM, Billie Rinaldi 
> > wrote:
> > >
> > > Welcome Gour Saha, a new committer and PPMC member for Apache Slider!
> > >
> > > Billie
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
>
>
> --
> thanks
> Sumit
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Slider-develop - Build # 312 - Still Failing

2014-10-13 Thread Gour Saha

I am fixing this now.

-Gour

On Mon, Oct 13, 2014 at 8:21 PM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> The Apache Jenkins build system has built Slider-develop (build #312)
>
> Status: Still Failing
>
> Check console output at https://builds.apache.org/job/Slider-develop/312/
> to view the results.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: One question about flex method

2014-10-16 Thread Gour Saha

Today, we don't give an option to provide a list of preferred nodes on
which containers should be stopped. Slider pretty much selects them at
random. We do have code to release the most recently created containers
first, but haven't exposed it as an option in slider client either.

If you think that Slider should provide such a capability then feel free to
file a JIRA.

-Gour

On Thu, Oct 16, 2014 at 8:35 AM, Rui Zhang  wrote:

> Hi. everyone.
>
> When the new size is less than the original size in the flex command, how
> to determine which nodes to stop?
>
> Thanks.
>
> --
> Rui Zhang
> Software engineer Intern
> Vertica, an HP Company
> rzh...@vertica.com
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Running 'python slider.py version' throws StackOverflowError

2014-10-27 Thread Gour Saha

Which OS are you running this on? What version of python are you using? Can
you run "./slider version" instead of "python slider.py version" (if you
are not on windows) and see what output you get?

-Gour

On Mon, Oct 27, 2014 at 9:26 AM, Pushkar Raste 
wrote:

> When I run 'python  slider.py version' I get following error
>
> root@hdfs03:/home/ubuntu/memcached# python
> /usr/local/slider/slider-0.40/bin/slider.py version
> slider_home = "/usr/local/slider/slider-0.40"
> slider_jvm_opts = "-Djava.net.preferIPv4Stack=true -Djava.awt.headless=true
> -Xmx256m -Djava.confdir=/usr/local/slider/slider-0.40/conf"
> slider_classpath =
> "/usr/local/slider/slider-0.40/lib/*:/usr/local/slider/slider-0.40/conf:"
> ready to exec : ['java', '-Djava.net.preferIPv4Stack=true',
> '-Djava.awt.headless=true', '-Xmx256m',
> '-Djava.confdir=/usr/local/slider/slider-0.40/conf', '-classpath',
> '/usr/local/slider/slider-0.40/lib/*:/usr/local/slider/slider-0.40/conf:',
> 'org.apache.slider.Slider', 'version']
> Exception: java.lang.StackOverflowError
> 2014-10-27 16:22:28,961 [main] ERROR main.ServiceLauncher - Exception:
> java.lang.StackOverflowError
> java.lang.StackOverflowError
> at java.util.Arrays.copyOf(Arrays.java:2219)
> at java.util.ArrayList.grow(ArrayList.java:242)
> at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216)
> at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208)
> at java.util.ArrayList.add(ArrayList.java:440)
> at
>
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:580)
> at
> org.apache.hadoop.conf.Configuration.get(Configuration.java:1065)
> at
> org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
>
> What am I missing
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: forking off releases/branch-0.60 to work with hadoop-2.6

2014-11-05 Thread Gour Saha

Steve,

The changes for SLIDER-555 (AM log4j) has been merged into develop and
releases/slider-0.60.

-Gour

On Wed, Nov 5, 2014 at 12:50 PM, Steve Loughran 
wrote:

> FYI, there's going to be an RC of Hadoop 2.6 this weekend. To celebrate
> this I'm creating a slider-0.60 release which will be in sync.
>
> 1. The branch already exists: releases/branch-0.60  please try and
> stabilize this. new features into develop/
>
> 2. I did one last-minute feature addition to slider today, before this fork
> : SLIDER-619 
>
> the registry --list and --listconf commands support the --out argument to
> take a file; if set it saves the output to a text file. This is for
> testing.
>
> Gour: I know you want to get your work on server-side logging in: once you
> are happy with it commit to develop/ and then cherry pick over the 0.60
> branch
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Maven repository?

2014-11-06 Thread Gour Saha

Until then, you should be able to get away by checking out the git repo and
run mvn install -DskipTests (to get it into your .m2) and then use this
(for develop) -

  
org.apache.slider
slider-core
0.61-incubating
  

-Gour

On Thu, Nov 6, 2014 at 6:51 AM, Billie Rinaldi 
wrote:

> Here's some info on publishing maven artifacts:
> http://www.apache.org/dev/publishing-maven-artifacts.html
> Although I think hadoop doesn't inherit from the ASF pom and just copies
> the stuff it needs into its own pom.  We may already have some of the
> things we need in the distribution management section.
> The artifacts will have be staged and voted upon, like our other release
> artifacts.
>
> On Thu, Nov 6, 2014 at 2:06 AM, Steve Loughran 
> wrote:
>
> > we haven't published to maven yet; I'll see about what it will take to do
> > that with the 0.60 release against the forthcoming hadoop-2.6-RC0. You
> > don't want publishing things to maven that depend on -SNAPSHOT stuff as
> it
> > is too brittle (every morning you end up getting a different version of
> the
> > binaries. If they are incompatible you get this rolling wave of
> application
> > failure travelling round the world as midnight crosses the timezones.
> I've
> > been in the situation where a groovy snapshot broke; I found it an hour
> > after germany and france; we were left filing bugs and workarounds until
> > the US woke up and fixed it...
> >
> > -steve
> >
> > On 6 November 2014 03:36, hsy...@gmail.com  wrote:
> >
> > > This is git repository? What I mean is if I want to write some code
> > depends
> > > on slider library. How would I include the dependency in pom.xml?
> > >
> > > Thanks!
> > >
> > >
> > > On Wed, Nov 5, 2014 at 7:23 PM, Ted Yu  wrote:
> > >
> > > > Slider maven repo is here:
> > > >
> > > > https://git-wip-us.apache.org/repos/asf/incubator-slider.git
> > > >
> > > >
> > > > You can checkout develop branch.
> > > >
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Wed, Nov 5, 2014 at 6:36 PM, hsy...@gmail.com 
> > > wrote:
> > > >
> > > > > Is there a public maven repository that I can checkout the slider
> > > > library?
> > > > >
> > > > > Best,
> > > > > Siyuan
> > > > >
> > > >
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: How to change log level for Slider AM?

2014-11-14 Thread Gour Saha

Under conf directory there is a file named *log4j-server.properties*. Just
edit that file and change *log4j.rootLogger* from INFO to what you like.

Then do slider stop and slider start. The new level for Slider AM will take
effect. The *slider.log* file is rotated as well on a 100KB size threshold.

This change was made fairly recently so you need to be on the latest
develop or releases/slider-0.60 branch.

-Gour

On Fri, Nov 14, 2014 at 3:57 PM, hsy...@gmail.com  wrote:

> How to change log level for Slider AM?
>
> Thanks!
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [VOTE] Apache Slider Incubating Release 0.60.0-incubating RC1

2014-11-15 Thread Gour Saha

+ 1

- successfully downloaded tarball
- verified md5, sha signatures
- followed the steps in the "Getting Started" guide
http://slider.incubator.apache.org/docs/getting_started.html and
successfully did all the following
  * built slider package - mvn clean site:site site:stage package
-DskipTests
  * ran all tests successfully - mvn
-Dslider.conf.dir=src/test/clusters/offline/slider/ clean install
  * created HBase application package
  * made config changes as per the guide, created a cluster and
successfully started the HBase application using Slider

-Gour


On Fri, Nov 14, 2014 at 2:28 PM, Sumit Mohanty 
wrote:

> +1.
>
> Modify the following to rc1,
>
> >>> source at
> >>>
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=shortlog;h=refs/tags/release-0.60.0-incubating-rc0
>
> Verified SHA and MD5 for both the tar ball and the zip file.
> The pgp signatures also verify against the public key.
>
> Created a clean VM and used the tar ball to successfully build and run unit
> tests.
>
> The tar ball builds fine and all unit tests pass. I used "mvn clean -X
>
> -Dslider.conf.dir=/usr/work/tar/apache-slider-0.60.0-incubating/src/test/clusters/offline/slider/
> package -Prat" which is equivalent of what jenkins uses.
>
> A note may be added for the Incubator General vote email that the file
>
> "./apache-slider-0.60.0-incubating/slider-core/src/test/python/appdef_1.tar"
> is not a binary file. Earlier it was raised as a question in case it is a
> binary file.
>
> thanks
> -Sumit
>
> On Fri, Nov 14, 2014 at 11:26 AM, Steve Loughran 
> wrote:
>
> > Hi
> >
> > The updated RC1 release of slider is up for review and vote. Please
> > download and review it
> >
> >
> > all changes in this release are listed at:
> >
> >
> https://issues.apache.org/jira/browse/SLIDER/fixforversion/12327198/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
> >
> > Main changes since the RC0
> >
> >- fixed license problem
> >- SLIDER-647: allocation requests not being satisfied when a cluster
> >goes to labels
> >(the default placement policy is now "lax", you can request "strict"
> on
> >an application instance or component-by-component basis
> >-
> >- SLIDER-650 regression: local zk nodes not being deleted on instance
> >destroy (well spotted Sumit!)
> >
> >  Note that this is a source only release and we are voting on the source.
> >
> >  artifacts at
> >
> >
> http://people.apache.org/~stevel/slider/slider-release-0.60.0-incubating-rc1
> >
> >  source at
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=shortlog;h=refs/tags/release-0.60.0-incubating-rc0
> >
> >  PGP keys at
> > http://pgp.mit.edu:11371/pks/lookup?op=vindex&search=ste...@apache.org
> >
> >   Build instructions at:
> > http://slider.incubator.apache.org/developing/building.html
> >
> > Vote will be open for 72 hours
> >
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > -Steve
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Application configuration page is outdated?

2014-11-18 Thread Gour Saha

Given you moved to 0.60 it would be a good idea to start with the default
file in this branch - ./app-packages/hbase/appConfig-default.json

The site documentation is getting updated as we speak.

-Gour

On Tue, Nov 18, 2014 at 2:14 PM, hsy...@gmail.com  wrote:

> I just figure out:
>
> config_types is changed to system_configs
>
> Or you can have configFile in metainfo.xml and create configuration
> folder with xml or env configuration files in it
>
>
> On Tue, Nov 18, 2014 at 2:04 PM, Sumit Mohanty 
> wrote:
>
> > Sharing the metainfo.xml, appConfig.json, and resources.json should be
> > enough.
> >
> > -Sumit
> >
> > On Tue, Nov 18, 2014 at 2:00 PM, Sumit Mohanty  >
> > wrote:
> >
> > > Is it possible for you to share the application package? I can browse
> > > through it to see what needs to change.
> > >
> > > On Tue, Nov 18, 2014 at 1:48 PM, hsy...@gmail.com 
> > > wrote:
> > >
> > >> Hi guys,
> > >>
> > >> I just switch from 40 to 60 and I found the application configuration
> > >> doesn't work any more.
> > >>
> > >>
> >
> http://slider.incubator.apache.org/docs/slider_specs/application_instance_configuration.html
> > >>
> > >> For example :
> > >>
> > >> {
> > >>   "schema" : "http://example.org/specification/v2.0.0";,
> > >>   "metadata" : {
> > >>   },
> > >>   "global" : {
> > >>   "config_types": "core-site,hdfs-site,hbase-site",
> > >>
> > >>   "java_home": "/usr/jdk64/jdk1.7.0_45",
> > >>   "package_list": "files/hbase-0.96.1-hadoop2-bin.tar",
> > >>   "create.default.zookeeper.node": "true"
> > >>
> > >>   "site.global.app_user": "yarn",
> > >>   "site.global.app_log_dir": "${AGENT_LOG_ROOT}/app/log",
> > >>   "site.global.app_pid_dir": "${AGENT_WORK_ROOT}/app/run",
> > >>   "site.global.security_enabled": "false",
> > >>
> > >>   "site.hbase-site.hbase.hstore.flush.retries.number": "120",
> > >>   "site.hbase-site.hbase.client.keyvalue.maxsize": "10485760",
> > >>   "site.hbase-site.hbase.hstore.compactionThreshold": "3",
> > >>   "site.hbase-site.hbase.rootdir": "${NN_URI}/apps/hbase/data",
> > >>   "site.hbase-site.hbase.tmp.dir":
> > "${AGENT_WORK_ROOT}/work/app/tmp",
> > >>   "site.hbase-site.hbase.master.info.port":
> > >> "${HBASE_MASTER.ALLOCATED_PORT}",
> > >>   "site.hbase-site.hbase.regionserver.port": "0",
> > >>   "site.hbase-site.zookeeper.znode.parent": "${DEF_ZK_PATH}",
> > >>
> > >>   "site.core-site.fs.defaultFS": "${NN_URI}",
> > >>   "site.hdfs-site.dfs.namenode.https-address": "${NN_HOST}:50470",
> > >>   "site.hdfs-site.dfs.namenode.http-address": "${NN_HOST}:50070"
> > >>   }
> > >> }
> > >>
> > >>
> > >> I can't get config_types in agent python script anymore. Only global
> > >> properties are set and caught
> > >>
> > >>
> > >>
> > >> Thanks!
> > >>
> > >
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: how to get get the port of memcached

2014-12-10 Thread Gour Saha

What do you get when you call "slider status "?

-Gour

On Wed, Dec 10, 2014 at 1:02 AM, 杨浩  wrote:

> Hi, I have installed the jmemcached successfully, but how can I use it, or
> how to get the port of memcached
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: how to get get the port of memcached

2014-12-15 Thread Gour Saha

live.time" : "12 Dec 2014 04:13:35 GMT",
> "live.time.millis" : "1418357615354",
> "create.time" : "12 Dec 2014 04:13:35 GMT",
> "create.time.millis" : "1418357615354",
> "containers.at.am-restart" : "0",
> "status.time" : "12 Dec 2014 04:22:58 GMT",
> "status.time.millis" : "1418358178437"
>   },
>   "statistics" : {
> "MEMCACHED" : {
>   "containers.start.started" : 1,
>   "containers.live" : 1,
>   "containers.start.failed" : 0,
>   "containers.active.requests" : 0,
>   "containers.failed" : 0,
>   "containers.completed" : 0,
>   "containers.desired" : 1,
>   "containers.requested" : 1
> },
> "slider-appmaster" : {
>   "containers.unknown.completed" : 1,
>   "containers.start.started" : 1,
>   "containers.live" : 2,
>   "containers.start.failed" : 0,
>   "containers.failed" : 0,
>   "containers.completed" : 0,
>   "containers.surplus" : 0
> }
>   },
>   "instances" : {
> "MEMCACHED" : [ "container_1418350976699_0004_03_02" ],
> "slider-appmaster" : [ "container_1418350976699_0004_03_01" ]
>   },
>   "roles" : {
> "MEMCACHED" : {
>   "yarn.memory" : "256",
>   "yarn.role.priority" : "1",
>   "role.requested.instances" : "0",
>   "role.failed.starting.instances" : "0",
>   "role.actual.instances" : "1",
>   "yarn.component.instances" : "1",
>   "role.releasing.instances" : "0",
>   "role.failed.instances" : "0"
> },
>     "slider-appmaster" : {
>   "yarn.memory" : "1024",
>   "role.requested.instances" : "0",
>   "role.failed.starting.instances" : "0",
>   "role.actual.instances" : "1",
>   "yarn.vcores" : "1",
>   "yarn.component.instances" : "1",
>   "role.releasing.instances" : "0",
>   "role.failed.instances" : "0"
> }
>   },
>   "clientProperties" : { },
>   "status" : {
> "live" : {
>   "MEMCACHED" : {
> "container_1418350976699_0004_03_02" : {
>   "name" : "container_1418350976699_0004_03_02",
>   "role" : "MEMCACHED",
>   "roleId" : 1,
>   "createTime" : 1418357617294,
>   "startTime" : 1418357617328,
>   "released" : false,
>   "host" : "localhost",
>   "state" : 3,
>   "exitCode" : 0,
>   "command" : "python ./infra/agent/slider-agent/agent/main.py
> --label container_1418350976699_0004_03_02___MEMCACHED --zk-quorum
> 127.0.0.1:2181 --zk-reg-path
> /registry/users/yang/services/org-apache-slider/memcached1 >
> /slider-agent.out 2>&1 ; ",
>   "diagnostics" : "",
>   "environment" : [ "AGENT_WORK_ROOT=\"$PWD\"",
> "HADOOP_USER_NAME=\"yang\"", "AGENT_LOG_ROOT=\"\"",
> "PYTHONPATH=\"./infra/agent/slider-agent/\"",
> "SLIDER_PASSPHRASE=\"aa178fGHttfGC7Cnss3DPbLzYDEmqJuDcCUNwAW2YUfyPNQMZN\""
> ]
> }
>   },
>   "slider-appmaster" : {
> "container_1418350976699_0004_03_01" : {
>   "name" : "container_1418350976699_0004_03_01",
>   "role" : "slider-appmaster",
>   "roleId" : 0,
>   "createTime" : 0,
>   "startTime" : 0,
>   "released" : false,
>   "host" : "yang",
>   "state" : 3,
>   "exitCode" : 0,
>   "command" : "",
>   "diagnostics" : ""
> }
>   }
> }
>   }
> }
> 2014-12-12 12:22:58,598 [main] INFO  util.ExitUtil - Exiting with status 0
>
>
> 2014-12-11 1:01 GMT+08:00 Gour Saha :
> >
> > What do you get when you call "slider status "?
> >
> > -Gour
> >
> > On Wed, Dec 10, 2014 at 1:02 AM, 杨浩  wrote:
> >
> > > Hi, I have installed the jmemcached successfully, but how can I use it,
> > or
> > > how to get the port of memcached
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: how to get get the port of memcached

2014-12-22 Thread Gour Saha

Do you mean REST API?

Significant work is going on in exposing REST API in slider for the next
major release. We still don't know the best way to expose a REST API to
retrieve the AM host:port (via YARN REST API maybe) as the REST endpoint
itself will be served by the Slider AM host:port, but will surely come up
with an elegant solution. Suggestions are welcome!!

Check the uber jira for more details -
https://issues.apache.org/jira/browse/SLIDER-151

-Gour

On Mon, Dec 22, 2014 at 1:50 AM, 杨浩  wrote:

> Hi ,I've get the am port through shell command "slider list
> "+applicationName+" --state RUNNING",but arguing with my boss, we think
> it's an ugly way to be used in production env.
>
> Can we get the am host:port through Java API
>
> 2014-12-16 9:07 GMT+08:00 Gour Saha :
>
> > Once the app is up and running can you hit the following url and copy
> paste
> > what you see?
> >
> > http://yang:8088/proxy//ws/v1/slider/publisher/slider
> >
> > where the  will be the value from the property "*
> > info.am.app.id
> > <http://info.am.app.id>*" in the status output above.
> >
> > -Gour
> >
> > On Thu, Dec 11, 2014 at 8:23 PM, 杨浩  wrote:
> > >
> > > yang@yang:/usr/local/slider$ slider status memcached1
> > > 2014-12-12 12:22:58,305 [main] INFO  client.RMProxy - Connecting to
> > > ResourceManager at yang/127.0.0.1:8032
> > > 2014-12-12 12:22:58,597 [main] INFO  client.SliderClient - {
> > >   "version" : "1.0",
> > >   "name" : "memcached1",
> > >   "type" : "agent",
> > >   "state" : 3,
> > >   "createTime" : 1418357615354,
> > >   "updateTime" : 1418357615603,
> > >   "originConfigurationPath" :
> > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/snapshot",
> > >   "generatedConfigurationPath" :
> > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/generated",
> > >   "dataPath" :
> > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/database",
> > >   "options" : {
> > > "slider.am.restart.supported" : "true",
> > > "site.global.security_enabled" : "false",
> > > "internal.application.home" : null,
> > > "internal.queue" : "default",
> > > "application.name" : "memcached1",
> > > "slider.cluster.directory.permissions" : "0770",
> > > "site.global.slider.allowed.ports" : "48000, 49000, 50001-50010",
> > > "internal.tmp.dir" :
> > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/tmp",
> > > "java_home" : "/opt/soft/jdk",
> > > "internal.snapshot.conf.path" :
> > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/snapshot",
> > > "env.MALLOC_ARENA_MAX" : "4",
> > > "zookeeper.path" : "/services/slider/users/yang/memcached1",
> > > "internal.container.failure.shortlife" : "6",
> > > "internal.application.image.path" : null,
> > > "internal.generated.conf.path" :
> > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/generated",
> > > "site.fs.default.name" : "hdfs://yang:8020",
> > > "site.global.additional_cp" : "/usr/lib/hadoop/lib/*",
> > > "zookeeper.hosts" : "127.0.0.1",
> > > "internal.provider.name" : "agent",
> > > "internal.data.dir.path" :
> > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/database",
> > > "site.fs.defaultFS" : "hdfs://yang:8020",
> > > "site.global.memory_val" : "200M",
> > > "slider.data.directory.permissions" : "0770",
> > > "site.global.listen_port" :
> > > "${MEMCACHED.ALLOCATED_PORT}{PER_CONTAINER}",
> > > "zookeeper.quorum" : "127.0.0.1:2181",
> > > "site.global.xmx_val" : "256m",
> > > "internal.am.tmp.dir" :
> > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/tmp/appmaster",
> > > "application.def" :
> ".sl

Re: how to get get the port of memcached

2014-12-22 Thread Gour Saha

Slider shell is just a client. You cannot expose REST API on top of a
client. There is no service running until you create/deploy an application
and the AM is running.

-Gour

On Mon, Dec 22, 2014 at 10:45 PM, 杨浩  wrote:

> I think a way to do so is that  exposing the REST API to get the result of
> slider shell command
>
> 2014-12-23 14:22 GMT+08:00 Gour Saha :
>
> > Do you mean REST API?
> >
> > Significant work is going on in exposing REST API in slider for the next
> > major release. We still don't know the best way to expose a REST API to
> > retrieve the AM host:port (via YARN REST API maybe) as the REST endpoint
> > itself will be served by the Slider AM host:port, but will surely come up
> > with an elegant solution. Suggestions are welcome!!
> >
> > Check the uber jira for more details -
> > https://issues.apache.org/jira/browse/SLIDER-151
> >
> > -Gour
> >
> > On Mon, Dec 22, 2014 at 1:50 AM, 杨浩  wrote:
> >
> > > Hi ,I've get the am port through shell command "slider list
> > > "+applicationName+" --state RUNNING",but arguing with my boss, we think
> > > it's an ugly way to be used in production env.
> > >
> > > Can we get the am host:port through Java API
> > >
> > > 2014-12-16 9:07 GMT+08:00 Gour Saha :
> > >
> > > > Once the app is up and running can you hit the following url and copy
> > > paste
> > > > what you see?
> > > >
> > > > http://yang:8088/proxy/
> /ws/v1/slider/publisher/slider
> > > >
> > > > where the  will be the value from the property "*
> > > > info.am.app.id
> > > > <http://info.am.app.id>*" in the status output above.
> > > >
> > > > -Gour
> > > >
> > > > On Thu, Dec 11, 2014 at 8:23 PM, 杨浩  wrote:
> > > > >
> > > > > yang@yang:/usr/local/slider$ slider status memcached1
> > > > > 2014-12-12 12:22:58,305 [main] INFO  client.RMProxy - Connecting to
> > > > > ResourceManager at yang/127.0.0.1:8032
> > > > > 2014-12-12 12:22:58,597 [main] INFO  client.SliderClient - {
> > > > >   "version" : "1.0",
> > > > >   "name" : "memcached1",
> > > > >   "type" : "agent",
> > > > >   "state" : 3,
> > > > >   "createTime" : 1418357615354,
> > > > >   "updateTime" : 1418357615603,
> > > > >   "originConfigurationPath" :
> > > > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/snapshot",
> > > > >   "generatedConfigurationPath" :
> > > > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/generated",
> > > > >   "dataPath" :
> > > > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/database",
> > > > >   "options" : {
> > > > > "slider.am.restart.supported" : "true",
> > > > > "site.global.security_enabled" : "false",
> > > > > "internal.application.home" : null,
> > > > > "internal.queue" : "default",
> > > > > "application.name" : "memcached1",
> > > > > "slider.cluster.directory.permissions" : "0770",
> > > > > "site.global.slider.allowed.ports" : "48000, 49000,
> 50001-50010",
> > > > > "internal.tmp.dir" :
> > > > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/tmp",
> > > > > "java_home" : "/opt/soft/jdk",
> > > > > "internal.snapshot.conf.path" :
> > > > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/snapshot",
> > > > > "env.MALLOC_ARENA_MAX" : "4",
> > > > > "zookeeper.path" : "/services/slider/users/yang/memcached1",
> > > > > "internal.container.failure.shortlife" : "6",
> > > > > "internal.application.image.path" : null,
> > > > > "internal.generated.conf.path" :
> > > > > "hdfs://yang:8020/user/yang/.slider/cluster/memcached1/generated",
> > > > > "site.fs.default.na

Re: Locality results in instance shut-down due to single bad instance

2015-01-06 Thread Gour Saha

Try setting property *yarn.component.placement.policy* to 2 for the
component, something like this -

"HBASE_MASTER": {
  "yarn.role.priority": "1",
  "yarn.component.instances": "1",
  "yarn.memory": "1500",
  "yarn.component.placement.policy": "2"
},

-Gour

On Tue, Jan 6, 2015 at 3:33 PM, Nitin Aggarwal  wrote:

> Hi,
>
> We keep on running into scenario, where one of the node in the cluster went
> bad (either due to clock out of sync, no disk space etc.). As a result
> container fails to start, and due to locality, container is assigned on the
> same machine again and again, and it fails again and again. After few
> failures, when failure threshold is reached (which is currently also not
> reset correctly. SLIDER-629), it triggers instance shut-down.
>
> Is there a way to give up locality, in case of multiple failures, to avoid
> this scenario ?
>
> Thanks
> Nitin Aggarwal
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: rm-ing the surplus slider-agent.tar.gz from the slider tar file

2015-01-06 Thread Gour Saha

+1 for getting rid of the /agent one

On Tue, Jan 6, 2015 at 9:23 AM, Sumit Mohanty 
wrote:

> Yes, the /agent one can be removed.
>
> On Tue, Jan 6, 2015 at 7:26 AM, Jon Maron  wrote:
>
> >
> > On Jan 6, 2015, at 9:54 AM, Steve Loughran 
> wrote:
> >
> > > https://issues.apache.org/jira/browse/SLIDER-641 covers the fact that
> we
> > > have two slider-agent.tar.gz files in the tar, one in /lib, the other
> in
> > > /agent.
> > >
> > > I want to cut one: which one should I keep? I'm assuming the agent one
> >
> > +1
> >
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
>
>
> --
> thanks
> Sumit
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Locality results in instance shut-down due to single bad instance

2015-01-06 Thread Gour Saha

Nitin,

I don't think we have a logic where we apply data locality and then upon a
certain no of failures (threshold) try with "no data locality" at least
once before giving up. It will be a good idea to file a JIRA with this
requirement.

-Gour


On Tue, Jan 6, 2015 at 5:12 PM, Nitin Aggarwal  wrote:

> I am running HBase application, and I prefer data locality. I don't want to
> give up locality by default. It's ok to lose locality in rare scenarios,
> where something is wrong with one of the local nodes.
> It's more of fail-safe that I am looking for, to give up locality, if it
> cannot be satisfied.
>
> Thanks
> Nitin
>
>
> On Tue, Jan 6, 2015 at 4:52 PM, Ted Yu  wrote:
>
> > Here is the meaning of 2 (see PlacementPolicy):
> >
> >* No data locality; do not bother trying to ask for any location
> >
> >*/
> >
> >   public static final int NO_DATA_LOCALITY = 2;
> >
> > On Tue, Jan 6, 2015 at 4:15 PM, Gour Saha  wrote:
> >
> > > Try setting property *yarn.component.placement.policy* to 2 for the
> > > component, something like this -
> > >
> > > "HBASE_MASTER": {
> > >   "yarn.role.priority": "1",
> > >   "yarn.component.instances": "1",
> > >   "yarn.memory": "1500",
> > >   "yarn.component.placement.policy": "2"
> > > },
> > >
> > > -Gour
> > >
> > > On Tue, Jan 6, 2015 at 3:33 PM, Nitin Aggarwal <
> > > nitin3588.aggar...@gmail.com
> > > > wrote:
> > >
> > > > Hi,
> > > >
> > > > We keep on running into scenario, where one of the node in the
> cluster
> > > went
> > > > bad (either due to clock out of sync, no disk space etc.). As a
> result
> > > > container fails to start, and due to locality, container is assigned
> on
> > > the
> > > > same machine again and again, and it fails again and again. After few
> > > > failures, when failure threshold is reached (which is currently also
> > not
> > > > reset correctly. SLIDER-629), it triggers instance shut-down.
> > > >
> > > > Is there a way to give up locality, in case of multiple failures, to
> > > avoid
> > > > this scenario ?
> > > >
> > > > Thanks
> > > > Nitin Aggarwal
> > > >
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> > >
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Locality results in instance shut-down due to single bad instance

2015-01-08 Thread Gour Saha

+1 on that

That's also what I meant when I said - 
>> I don't think we have a logic where we apply data locality and then upon a
>> certain no of failures (threshold) try with "no data locality" at least
>> once before giving up. It will be a good idea to file a JIRA with this
>> requirement.

-Gour

- Sent from my iPhone

> On Jan 8, 2015, at 3:30 AM, Steve Loughran  wrote:
> 
> thinking about this some more, we could use our tracking of node
> reliability to tune our placement decisions.
> 
> 
>   1. We add a "recent failures" field to the node entries, alongside the
>   "total failures"
>   2. Our scheduled failure count resetter will set that field to zero,
>   alongside the component failures
>   3. When Slider has to request a new container, unless the placement
>   policy is STRICT, we will continue to use the (persisted) placement history
>   4. Except now, if a node has a recent failure count above some
>   threshold, we don't ask for a container on that node...we just ask for
>   "anywhere" placement.
> 
> What do people think?
> 
>> On 7 January 2015 at 09:50, Steve Loughran  wrote:
>> 
>> the history of where things were is retained in the RoleHistory
>> structures, persisted to HDFS and reread on startup. for each component
>> type, it's sorted by most-recent-first.
>> 
>> When a container is needed, the AM looks in that history first, and looks
>> through the list of "previously used nodes for that component type".,
>> skipping any that already have an instance of that component running. The
>> chosen node is taken off the list, so there's no duplicates
>> (exception: the component type doesn't have any locality, in which case
>> although the history is tracked, it's not used for placement)
>> 
>> 
>> 
>> When a placement on the node comes in, then its taken off the "pending
>> list"
>> 
>> There's one small issue here: no way to tie requests to allocations. We
>> don't really care which request allocates a component to a node, we just
>> like to track outstanding requests for explicit nodes. The algorithm is
>> -allocation to a requested node: remove node from "list of outstanding
>> explicit requests"
>> -allocation to another node: do nothing while there are outstanding
>> requests
>> -all outstanding requests satisfied: clean the list of outstanding
>> "placed" requests.
>> 
>> Now, fun happens when a container fails on a newly allocated node —and its
>> here there may be some policy tuning required.
>> 
>> It comes down to this: what is the best way to react when a component
>> fails to start, either immediately, or shortly after startup? This can be a
>> sign of a major problem "node doesn't run my app", or something transient
>> "port still considered in use"
>> 
>> If its a transient problem, there's no harm in asking again.
>> 
>> If its a permanent problem: we need to make the decision that this node is
>> bad —at least for that specific component.
>> 
>> I think right now, on a startup/launch time failure, the failing node is
>> placed at the back of the list of recently used nodes; the failure counts
>> of both the node and the component incremented. Although there's a YARN API
>> where an application can provide blacklist hints to YARN, we're not
>> currently using it.
>> 
>> I think what you may be seeing is that Slider is repeatedly asking for the
>> same node: it's failing and going to the back of the list of previously
>> used nodes, but at there is only one, it's being asked for again.
>> 
>> We can tune this -maybe- but it gets complex.
>> 
>> 1. If the placement policy is STRICT, then we must ask for that previously
>> used node. (Though thinking about it, the component must have started at
>> least once at some point in the past...I don't know if the special case of
>> "previously allocated but never started" is detected and handled)
>> 
>> 2. If the placement is location-preferred, default, how best to react to a
>> launch failure? Completely cut that node off the list of suitable targets?
>> Or try again a few more times? If its a transient problem, retry gives
>> locality without over-reacting. If its a permanent problem, then retrying
>> is the wrong policy.
>> 
>> What should we do here? We are tracking failures in NodeEntry entries, in
>> a map of the cluster built up (NodeMap), but not currently using failure

Planning Slider Release 0.70 - early Feb

2015-01-20 Thread Gour Saha

Folks,

I am planning to cut a 0.70 release for Slider by early Feb. The code
freeze date is set at Jan 31st.

If there are any specific JIRAs that needs to get into the 0.70 release
please set Fix Version to "Slider 0.70" and reply back to this dev group.

-Gour

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: updating the release process; what to do about branches

2015-01-28 Thread Gour Saha

The most common use I have seen for the master branch, is to signify that
it contains the most recent release of the product. The individual release
branches would still be there, like releases/slider-0.60,
releases/slider-0.70, etc. But for anyone who would just need the latest
release would not have to look around to find the latest branch or tag.
They would just checkout master. Very old release branches can be purged
from time to time.

*A typical project lifecycle from branching point of view could be
something like this -*
Development happens in develop branch. There would typically be bunch of
features developed in specific feature branches, and they would be merged
into develop. A release branch is then created off of develop when the code
freeze date for a specific release is reached. From then on typically only
bug fixes are taken into the release branch based on QE find. The develop
branch is open at this point for regular development for next releases,
etc. Once the release branch is stable and free of must fix bugs, it is
ready to be released. All release work is done off of the release branch.
Once the global announcement is made and binaries are released to the world
the master branch is synced with this latest release branch.

This document provides more details and is used by quite a few companies
for internal development -
http://nvie.com/posts/a-successful-git-branching-model/

Thoughts and comments on this strategy?

-Gour


On Wed, Jan 28, 2015 at 10:02 AM, Josh Elser  wrote:

> Could also just remove "master" in its current use and s/develop/master/,
> leveraging the master branch as the normal place things are implemented.
>
> It really doesn't matter in the end (it's just a name), but, if this is
> also signifying a move away from git-flow, it makes more sense to me to use
> "master" instead of "develop".
>
>
> Sumit Mohanty wrote:
>
>> I vote for removing the master branch. This is in the line of what I was
>> also wondering since we have created branches for 0.60 and 0.70. Branches
>> can remain the source of truth for the release and can facilitate minor
>> releases if needed.
>>
>> On Wed, Jan 28, 2015 at 8:58 AM, Steve Loughran
>> wrote:
>>
>>  The latest release process document is now in svn at
>>> site/trunk/content/developing/releasing.md
>>>
>>> It hasn't yet propagated to the HTML view, when it does it will be at
>>>
>>> http://slider.incubator.apache.org/developing/releasing.html
>>>
>>> I think we've outgrown the git flow release process.
>>>
>>> The feature branch seems to work well, but the release process has
>>> everything merged into the branch "master",
>>>
>>> - It doesn't handle long-lived release/supported branches
>>> - Merging into master/ can create convoluted dependency graphs,
>>> resulting a commit graph (and hence git commit ID) which is different
>>> from
>>> what is released.
>>>
>>> What are we to do?
>>>
>>> I'm wondering if we should get rid of that master/ branch altogether.
>>>
>>> Instead we could have some tags which we could move around:
>>>
>>> - last_branch_6_stable_release
>>> - last_branch_6_dev_release
>>> - last_branch_7_stable_release
>>> - last_branch_7_dev_release
>>> - last_stable_release
>>> - last_dev_release
>>>
>>> If you fetch all tags then check out by tag, you end with whatever
>>> version
>>> we think is "last" on a branch; the stable/dev releases can even cross
>>> branches as something migrates from development to stable
>>>
>>> During the release process, instead of doing git merge master work, we'd
>>> just delete some tags, create the new ones and then push them to the
>>> origin.
>>>
>>> Thoughts?
>>>
>>> -steve
>>>
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender
>>> immediately
>>> and delete it from your system. Thank You.
>>>
>>>
>>
>>
>>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

A new branch created

2015-02-03 Thread Gour Saha

Just FYI.. A new branch releases/slider-0.61.0-incubating has been created 
corresponding to the tag slider-0.61.0-incubating.

-Gour

Re: [VOTE] Move Slider to JDK7+

2015-02-20 Thread Gour Saha

+1

On 2/20/15, 2:00 PM, "Sumit Mohanty"  wrote:

>Given that Java 6 is no longer supported (even Java 7 will not be
>supported
>after Spring 2015) and Hadoop 2.7+ is JDK7+, we should have Slider move to
>JDK7+.
>
>This is a call to vote to have Slider move to JDK7+ after the upcoming
>0.70
>release. As 0.70 already has its release branch, we can have the develop
>branch moved to JDK7 as soon as the vote passes.
>
>The vote will be open for at least 72 hours.
>
>[ ] +1  approve
>[ ] -1  disapprove (and reason why)
>
>To start a +1 from me.
>
>Thanks
>Sumit

Re: Intermittent issues accessing zookeeper

2015-02-25 Thread Gour Saha

Can you check the zk logs at /var/log/zookeeper/zookeeper.out and see if
you find something?

Also, see if you can use the zkCli.sh client to query in a loop for few
minutes (with few secs of interval between queries) and see if you get
similar intermittent connection issues?

-Gour

On 2/25/15, 6:53 AM, "Jon Maron"  wrote:

>I¹ve noticed that I¹m having intermittent issues accessing the zookeeper
>quorum during ³destroy² attempts:
>
>2015-02-25 09:48:02,345 [main] WARN  client.SliderClient
>(SliderClient.java:getZkClient(523)) - Unable to connect to zookeeper
>quorum 
>c6402.ambari.apache.org:2181,c6404.ambari.apache.org:2181,c6403.ambari.apa
>che.org:2181,c6405.ambari.apache.org:2181
>java.net.ConnectException: Unable to connect to ZK quorum
>   at 
>org.apache.slider.core.zk.BlockingZKWatcher.waitForZKConnection(BlockingZK
>Watcher.java:63)
>   at 
>org.apache.slider.client.SliderClient.getZkClient(SliderClient.java:518)
>   at 
>org.apache.slider.client.SliderClient.deleteZookeeperNode(SliderClient.jav
>a:458)
>   at 
>org.apache.slider.client.SliderClient.actionDestroy(SliderClient.java:550)
>   at org.apache.slider.client.SliderClient.exec(SliderClient.java:383)
>   at 
>org.apache.slider.client.SliderClient.runService(SliderClient.java:348)
>   at 
>org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.
>java:188)
>   at 
>org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceL
>auncher.java:475)
>   at 
>org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLa
>uncher.java:403)
>   at 
>org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.ja
>va:630)
>   at org.apache.slider.Slider.main(Slider.java:49)
>2015-02-25 09:48:02,656 [main] DEBUG client.SliderClient
>(SliderClient.java:deleteZookeeperNode(474)) - Unable to recursively
>delete zk node /services/slider/users/jmaron/hbase-test
>2015-02-25 09:48:02,656 [main] DEBUG client.SliderClient
>(SliderClient.java:deleteZookeeperNode(475)) - Reason:
>org.apache.zookeeper.KeeperException$ConnectionLossException:
>KeeperErrorCode = ConnectionLoss for
>/services/slider/users/jmaron/hbase-test
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
>   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
>   at org.apache.slider.core.zk.ZKIntegration.stat(ZKIntegration.java:164)
>   at 
> org.apache.slider.core.zk.ZKIntegration.exists(ZKIntegration.java:160)
>   at 
>org.apache.slider.client.SliderClient.deleteZookeeperNode(SliderClient.jav
>a:460)
>   at 
>org.apache.slider.client.SliderClient.actionDestroy(SliderClient.java:550)
>   at org.apache.slider.client.SliderClient.exec(SliderClient.java:383)
>   at 
>org.apache.slider.client.SliderClient.runService(SliderClient.java:348)
>   at 
>org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.
>java:188)
>   at 
>org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceL
>auncher.java:475)
>   at 
>org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLa
>uncher.java:403)
>   at 
>org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.ja
>va:630)
>   at org.apache.slider.Slider.main(Slider.java:49)
>
>Any ideas on why that may occur?  My cluster is running on a set of VMs
>on my development box.  These failed ZK interactions will subsequently
>yield issues in trying to recreate the given application (in this case
>HBase)
>
>‹ Jon

Re: Intermittent issues accessing zookeeper

2015-02-25 Thread Gour Saha

Yup, those logs are normal. Are all your 3 nodes healthy? Not sure what
could be causing it.

On 2/25/15, 7:28 AM, "Jon Maron"  wrote:

>
>> On Feb 25, 2015, at 10:16 AM, Gour Saha  wrote:
>> 
>> Can you check the zk logs at /var/log/zookeeper/zookeeper.out and see if
>> you find something?
>
>I see a bunch of these but I’m assuming these are normal for a
>disconnected client connection:
>
>2015-02-23 19:40:21,320 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
>socket connection for client /192.168.64.105:34018 (no session
>established for client)
>2015-02-23 19:41:21,311 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
>Accepted socket connection from /192.168.64.105:34031
>2015-02-23 19:41:21,319 - WARN
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught
>end of stream exception
>EndOfStreamException: Unable to read additional data from client
>sessionid 0x0, likely client has closed socket
>at 
>org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>at 
>org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.
>java:208)
>at java.lang.Thread.run(Thread.java:745)
>2015-02-23 19:41:21,319 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
>socket connection for client /192.168.64.105:34031 (no session
>established for client)
>2015-02-23 19:41:52,896 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
>Accepted socket connection from /192.168.64.104:46949
>2015-02-23 19:41:52,896 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client
>attempting to establish new session at /192.168.64.104:46949
>2015-02-23 19:41:52,900 - INFO  [CommitProcessor:4:ZooKeeperServer@617] -
>Established session 0x44bb7e82d730002 with negotiated timeout 1 for
>client /192.168.64.104:46949
>2015-02-23 19:41:52,916 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
>socket connection for client /192.168.64.104:46949 which had sessionid
>0x44bb7e82d730002
>2015-02-23 19:42:21,313 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
>Accepted socket connection from /192.168.64.105:34054
>2015-02-23 19:42:21,314 - WARN
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught
>end of stream exception
>EndOfStreamException: Unable to read additional data from client
>sessionid 0x0, likely client has closed socket
>at 
>org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>at 
>org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.
>java:208)
>at java.lang.Thread.run(Thread.java:745)
>2015-02-23 19:42:21,314 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
>socket connection for client /192.168.64.105:34054 (no session
>established for client)
>2015-02-23 19:42:38,263 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
>Accepted socket connection from /192.168.64.1:52286
>2015-02-23 19:42:38,265 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client
>attempting to establish new session at /192.168.64.1:52286
>2015-02-23 19:42:38,269 - INFO  [CommitProcessor:4:ZooKeeperServer@617] -
>Established session 0x44bb7e82d730003 with negotiated timeout 4 for
>client /192.168.64.1:52286
>2015-02-23 19:42:39,316 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
>socket connection for client /192.168.64.1:52286 which had sessionid
>0x44bb7e82d730003
>2015-02-23 19:43:14,665 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
>Accepted socket connection from /192.168.64.105:34129
>2015-02-23 19:43:14,667 - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client
>attempting to establish new session at /192.168.64.105:34129
>2015-02-23 19:43:14,672 - INFO  [CommitProcessor:4:ZooKeeperServer@617] -
>Established session 0x44bb7e82d730004 with negotiated timeout 1 for
>client /192.168.64.105:34129
>2015-02-23 19:43:14,681 - WARN
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught
>end of stream exception
>EndOfStreamException: Unable to read additional data from client
>sessionid 0x44bb7e82d730004, likely client has closed socket
>at 
>org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>at 
>org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.
>java:208)
>at java.lang.Thread.run(Thread.java:745)
>
>
>> 
>> Also, see if you can u

[VOTE] Apache Slider Incubating Release 0.70.0-incubating

2015-03-06 Thread Gour Saha

Hello,

This is a call for a vote on Apache Slider Incubating 0.70.0-incubating release.

This is a source+binary release.

The issues fixed in this release are listed at at:
https://issues.apache.org/jira/browse/SLIDER/fixforversion/12327847 (or the 
shortened URL http://s.apache.org/AnM)


Artifacts at
https://repository.apache.org/content/repositories/orgapacheslider-1004/org/apache/slider


Git source tag:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=tag;h=refs/tags/slider-0.70.0-incubating


PGP keys at
http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org


Build instructions at:
http://slider.incubator.apache.org/developing/building.html


Vote will be open for 72 hours

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)


To start, here's my vote: +1

-Gour

Re: [VOTE] Apache Slider Incubating Release 0.70.0-incubating

2015-03-09 Thread Gour Saha

Josh,
We are not past 72 hours yet. It is on till end of today. 

-Gour

> On Mar 8, 2015, at 10:02 PM, "Josh Elser"  wrote:
> 
> I just saw this sitting in my inbox. After traveling this weekend, I took the 
> rest of it to myself. As such, I think I've already missed the time window. 
> I'm sorry about that Gour.
> 
> Might we be able to extend the voting window through tmrw evening?
> 
> Gour Saha wrote:
>> Hello,
>> 
>> This is a call for a vote on Apache Slider Incubating 0.70.0-incubating 
>> release.
>> 
>> This is a source+binary release.
>> 
>> The issues fixed in this release are listed at at:
>> https://issues.apache.org/jira/browse/SLIDER/fixforversion/12327847 (or the 
>> shortened URL http://s.apache.org/AnM)
>> 
>> 
>> Artifacts at
>> https://repository.apache.org/content/repositories/orgapacheslider-1004/org/apache/slider
>> 
>> 
>> Git source tag:
>> https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=tag;h=refs/tags/slider-0.70.0-incubating
>> 
>> 
>> PGP keys at
>> http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org
>> 
>> 
>> Build instructions at:
>> http://slider.incubator.apache.org/developing/building.html
>> 
>> 
>> Vote will be open for 72 hours
>> 
>> [ ] +1 approve
>> [ ] +0 no opinion
>> [ ] -1 disapprove (and reason why)
>> 
>> 
>> To start, here's my vote: +1
>> 
>> -Gour
>>

Re: [VOTE] Apache Slider Incubating Release 0.70.0-incubating

2015-03-09 Thread Gour Saha

Thanks Josh. Sure thing, I will provide the hash.

-Gour

On 3/9/15, 6:06 PM, "Josh Elser"  wrote:

>+1
>
>* Checked hash/sigs
>* Fetched Gour's key
>* Built from tag (a8919c84)
>* Ran rat check on source release
>* Verified issues from IPMC vote on 0.61.0 were resolved
>
>One note, make sure that the source tag identifier/URL provided to IPMC
>is the SHA1 and not a logical name (as that isn't guaranteed unique).
>
>Gour Saha wrote:
>> Hello,
>>
>> This is a call for a vote on Apache Slider Incubating 0.70.0-incubating
>>release.
>>
>> This is a source+binary release.
>>
>> The issues fixed in this release are listed at at:
>> https://issues.apache.org/jira/browse/SLIDER/fixforversion/12327847 (or
>>the shortened URL http://s.apache.org/AnM)
>>
>>
>> Artifacts at
>> 
>>https://repository.apache.org/content/repositories/orgapacheslider-1004/o
>>rg/apache/slider
>>
>>
>> Git source tag:
>> 
>>https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=tag;h=re
>>fs/tags/slider-0.70.0-incubating
>>
>>
>> PGP keys at
>> http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org
>>
>>
>> Build instructions at:
>> http://slider.incubator.apache.org/developing/building.html
>>
>>
>> Vote will be open for 72 hours
>>
>> [ ] +1 approve
>> [ ] +0 no opinion
>> [ ] -1 disapprove (and reason why)
>>
>>
>> To start, here's my vote: +1
>>
>> -Gour
>>

[RESULTS] [VOTE] Apache Slider Incubating Release 0.70.0-incubating

2015-03-11 Thread Gour Saha

The vote passes with 6 +1 votes

杨浩 (yangha...@gmail.com) +1

Steve Loughran +1 (binding)

Jon Maron  +1

Josh Elser +1

Ted Yu +1

Gour Saha  +1

Mail Thread:
http://mail-archives.apache.org/mod_mbox/incubator-slider-dev/201503.mbox/%
3cd11fbaf8.a971%25gs...@hortonworks.com%3E


-Gour


On 3/6/15, 8:13 PM, "Gour Saha"  wrote:

>Hello,
>
>This is a call for a vote on Apache Slider Incubating 0.70.0-incubating
>release.
>
>This is a source+binary release.
>
>The issues fixed in this release are listed at at:
>https://issues.apache.org/jira/browse/SLIDER/fixforversion/12327847 (or
>the shortened URL http://s.apache.org/AnM)
>
>
>Artifacts at
>https://repository.apache.org/content/repositories/orgapacheslider-1004/or
>g/apache/slider
>
>
>Git source tag:
>https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=tag;h=ref
>s/tags/slider-0.70.0-incubating
>
>
>PGP keys at
>http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org
>
>
>Build instructions at:
>http://slider.incubator.apache.org/developing/building.html
>
>
>Vote will be open for 72 hours
>
>[ ] +1 approve
>[ ] +0 no opinion
>[ ] -1 disapprove (and reason why)
>
>
>To start, here's my vote: +1
>
>-Gour

Re: [RESULTS] [VOTE] Apache Slider Incubating Release 0.70.0-incubating

2015-03-11 Thread Gour Saha

Sorry for the broken mail thread link -
http://mail-archives.apache.org/mod_mbox/incubator-slider-dev/201503.mbox/%3cd11fbaf8.a971%25gs...@hortonworks.com%3E

-Gour


On 3/11/15, 11:04 AM, "Gour Saha" 
mailto:gs...@hortonworks.com>> wrote:

The vote passes with 6 +1 votes

杨浩 (yangha...@gmail.com<mailto:yangha...@gmail.com>) +1

Steve Loughran +1 (binding)

Jon Maron  +1

Josh Elser +1

Ted Yu     +1

Gour Saha  +1

Mail Thread:
http://mail-archives.apache.org/mod_mbox/incubator-slider-dev/201503.mbox/%
3cd11fbaf8.a971%25gs...@hortonworks.com%3E<mailto:3cd11fbaf8.a971%25gs...@hortonworks.com%3E>


-Gour


On 3/6/15, 8:13 PM, "Gour Saha" 
mailto:gs...@hortonworks.com>> wrote:

Hello,

This is a call for a vote on Apache Slider Incubating 0.70.0-incubating
release.

This is a source+binary release.

The issues fixed in this release are listed at at:
https://issues.apache.org/jira/browse/SLIDER/fixforversion/12327847 (or
the shortened URL http://s.apache.org/AnM)


Artifacts at
https://repository.apache.org/content/repositories/orgapacheslider-1004/or
g/apache/slider


Git source tag:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=tag;h=ref
s/tags/slider-0.70.0-incubating


PGP keys at
http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org


Build instructions at:
http://slider.incubator.apache.org/developing/building.html


Vote will be open for 72 hours

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)


To start, here's my vote: +1

-Gour

[VOTE] Release Apache Slider 0.70.0-incubating

2015-03-11 Thread Gour Saha

Hello,

This is a call for a vote for releasing Apache Slider 0.70.0-incubating.

This is a source+binary release with one .tar file (appdef_1.tar), which is a
text file used for -ve testing.

Summary of fixes: http://s.apache.org/AnM
Vote thread: http://s.apache.org/YQx
Results: http://s.apache.org/fFH

Staged artifacts:
https://repository.apache.org/content/repositories/orgapacheslider-1004/org/apache/slider/

Git Source:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=a8919c847547f0f0db74d76f67f06e1d423a61d3
SHA1: a8919c847547f0f0db74d76f67f06e1d423a61d3
Tag: slider-0.70.0-incubating

PGP key:
http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org

Basic build/test instructions:
http://slider.incubator.apache.org/developing/building.html

Please vote on releasing this package as Apache Slider 0.70.0-incubating.

This vote will be open for 72 hours.

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

Thank You,
The Apache Slider Team

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

[VOTE] Apache Slider Incubating Release 0.70.1-incubating

2015-03-16 Thread Gour Saha

Hello,

This is a call for a vote on Apache Slider 0.70.1-incubating release.

This is a source+binary release. The following 2 issues were identified in 
release 0.70.0-incubating and have been fixed in this release.
https://issues.apache.org/jira/browse/SLIDER-815
https://issues.apache.org/jira/browse/SLIDER-814

The list of all issues fixed in 0.70:
https://issues.apache.org/jira/browse/SLIDER/fixforversion/12327847 (or the 
shortened URL http://s.apache.org/AnM)

Staged artifacts:
https://repository.apache.org/content/repositories/orgapacheslider-1005/org/apache/slider

Git source:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=7f20707b71e53025deff33b4981310aedaa42d43
SHA1: 7f20707b71e53025deff33b4981310aedaa42d43
Tag: slider-0.70.1-incubating

PGP key:
http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org

Build build/test instructions at:
http://slider.incubator.apache.org/developing/building.html

Vote will be open for 72 hours

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

To start, here's my vote: +1

-Gour

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

[RESULT] [VOTE] Apache Slider Incubating Release 0.70.1-incubating

2015-03-23 Thread Gour Saha

The vote passes with 4 +1 votes

Steve Loughran  +1 (binding)
Josh Elser+1
Ted Yu  +1
Gour Saha   +1

Mail Thread:
http://mail-archives.apache.org/mod_mbox/incubator-slider-dev/201503.mbox/%3CD12CAA07.AEB5%25gsaha%40hortonworks.com%3E

-Gour

From: Gour Saha mailto:gs...@hortonworks.com>>
Date: Monday, March 16, 2015 at 3:41 PM
To: "dev@slider.incubator.apache.org<mailto:dev@slider.incubator.apache.org>" 
mailto:dev@slider.incubator.apache.org>>
Subject: [VOTE] Apache Slider Incubating Release 0.70.1-incubating

Hello,

This is a call for a vote on Apache Slider 0.70.1-incubating release.

This is a source+binary release. The following 2 issues were identified in 
release 0.70.0-incubating and have been fixed in this release.
https://issues.apache.org/jira/browse/SLIDER-815
https://issues.apache.org/jira/browse/SLIDER-814

The list of all issues fixed in 0.70:
https://issues.apache.org/jira/browse/SLIDER/fixforversion/12327847 (or the 
shortened URL http://s.apache.org/AnM)

Staged artifacts:
https://repository.apache.org/content/repositories/orgapacheslider-1005/org/apache/slider

Git source:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=7f20707b71e53025deff33b4981310aedaa42d43
SHA1: 7f20707b71e53025deff33b4981310aedaa42d43
Tag: slider-0.70.1-incubating

PGP key:
http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org

Build build/test instructions at:
http://slider.incubator.apache.org/developing/building.html

Vote will be open for 72 hours

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

To start, here's my vote: +1

-Gour

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

[VOTE] Release Apache Slider 0.70.1-incubating

2015-03-23 Thread Gour Saha

Hello,

This is a call for a vote for releasing Apache Slider 0.70.1-incubating.

This is a source+binary release with one .tar file (appdef_1.tar), which is a 
text file used for -ve testing.

The following 2 issues were identified in release 0.70.0-incubating and have 
been fixed in this release -
https://issues.apache.org/jira/browse/SLIDER-815
https://issues.apache.org/jira/browse/SLIDER-814

Summary of fixes: http://s.apache.org/AnM
Vote thread: http://s.apache.org/OWo
Results: http://s.apache.org/Lhd

Staged artifacts:
https://repository.apache.org/content/repositories/orgapacheslider-1005/org/apache/slider

Git Source:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=7f20707b71e53025deff33b4981310aedaa42d43
SHA1: 7f20707b71e53025deff33b4981310aedaa42d43
Tag: slider-0.70.1-incubating

PGP key:
http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org

Basic build/test instructions:
http://slider.incubator.apache.org/developing/building.html

Please vote on releasing this package as Apache Slider 0.70.1-incubating.

This vote will be open for 72 hours.

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

Thank You,
The Apache Slider Team

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

Re: quick note about slider release

2015-03-24 Thread Gour Saha

Hi Ted,
Thank you for bringing this up. I will work with the team to incorporate your 
suggestion.

-Gour

From: Ted Dunning mailto:ted.dunn...@gmail.com>>
Date: Tuesday, March 24, 2015 at 9:44 AM
To: Gour Saha mailto:gs...@hortonworks.com>>
Subject: quick note about slider release

Just a quick note about Apache releases (relative to the Slider vote).

Officially, all Apache releases are source only.

Binary releases are not really releases, but simply conveniences for users.  It 
is good to scrutinize them carefully, but they are not what Apache really 
releases.

You may well know this already, but it doesn't hurt to use precise language in 
the call for votes.

[RESULT] [VOTE] Release Apache Slider 0.70.1-incubating

2015-03-27 Thread Gour Saha

The VOTE passes with 3, +1 binding votes from IPMC and no -1s.

Jean-Baptiste Onofré+1 (binding)
Billie Rinaldi  +1 (binding)
Steve Loughran  +1 (binding)

Thanks to everyone for the votes.

Next Steps:
Push the release out to the mirrors and announce.

-Gour

From: Gour Saha mailto:gs...@hortonworks.com>>
Date: Monday, March 23, 2015 at 9:40 AM
To: "gene...@incubator.apache.org<mailto:gene...@incubator.apache.org>" 
mailto:gene...@incubator.apache.org>>
Cc: "dev@slider.incubator.apache.org<mailto:dev@slider.incubator.apache.org>" 
mailto:dev@slider.incubator.apache.org>>
Subject: [VOTE] Release Apache Slider 0.70.1-incubating

Hello,

This is a call for a vote for releasing Apache Slider 0.70.1-incubating.

This is a source+binary release with one .tar file (appdef_1.tar), which is a 
text file used for -ve testing.

The following 2 issues were identified in release 0.70.0-incubating and have 
been fixed in this release -
https://issues.apache.org/jira/browse/SLIDER-815
https://issues.apache.org/jira/browse/SLIDER-814

Summary of fixes: http://s.apache.org/AnM
Vote thread: http://s.apache.org/OWo
Results: http://s.apache.org/Lhd

Staged artifacts:
https://repository.apache.org/content/repositories/orgapacheslider-1005/org/apache/slider

Git Source:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=7f20707b71e53025deff33b4981310aedaa42d43
SHA1: 7f20707b71e53025deff33b4981310aedaa42d43
Tag: slider-0.70.1-incubating

PGP key:
http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org

Basic build/test instructions:
http://slider.incubator.apache.org/developing/building.html

Please vote on releasing this package as Apache Slider 0.70.1-incubating.

This vote will be open for 72 hours.

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

Thank You,
The Apache Slider Team

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

[ANNOUNCE] Apache Slider 0.70.1-incubating release

2015-03-30 Thread Gour Saha

The Apache Slider team is proud to announce Apache Slider incubation release 
version 0.70.1-incubating.

Apache Slider (incubating) is a YARN application which deploys existing 
distributed applications on YARN, monitors them, and makes them larger or 
smaller as desired - even while the application is running.

The release artifacts are available at:
http://www.apache.org/dyn/closer.cgi/incubator/slider/0.70.1-incubating/

To use these artifacts, please use the following documentation:
http://slider.incubator.apache.org/docs/getting_started.html

We would like to thank all the contributors that made the release possible.

Regards,
The Slider Team

Re: [ANNOUNCE] Apache Slider 0.70.1-incubating release

2015-03-30 Thread Gour Saha

Hi Josh,

Is the paragraph below the disclaimer text you are referring to? If yes, then 
it is totally my bad. I think I referred to the last announce email, and 
completely missed cross-referencing it with what we have in slider's releasing 
web page.

Apache Slider is an effort undergoing incubation at The Apache Software 
Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is required 
of all newly accepted projects until a further review indicates that the 
infrastructure, communications, and decision making process have stabilized in 
a manner consistent with other successful ASF projects. While incubation status 
is not necessarily a reflection of the completeness or stability of the code, 
it does indicate that the project has yet to be fully endorsed by the ASF.

-Gour

On 3/30/15, 3:19 PM, "Josh Elser" mailto:els...@apache.org>> 
wrote:

Gour,

This announcement needs to include the incubating project disclaimer. I
forgot to add it myself on 0.60 :). Did you happen to copy this text
from somewhere in our docs that did not include the disclaimer as well?
I had thought I updated the site to include the necessary disclaimer.

It already went out now, so it's there's not much to do now, but we do
need to make sure this doesn't keep happening.

Gour Saha wrote:
The Apache Slider team is proud to announce Apache Slider incubation release 
version 0.70.1-incubating.

Apache Slider (incubating) is a YARN application which deploys existing 
distributed applications on YARN, monitors them, and makes them larger or 
smaller as desired - even while the application is running.

The release artifacts are available at:
http://www.apache.org/dyn/closer.cgi/incubator/slider/0.70.1-incubating/

To use these artifacts, please use the following documentation:
http://slider.incubator.apache.org/docs/getting_started.html

We would like to thank all the contributors that made the release possible.

Regards,
The Slider Team

Re: [ANNOUNCE] Apache Slider 0.70.1-incubating release

2015-03-31 Thread Gour Saha

Sure thing. Thanks for pointing this out.

-Gour

On 3/30/15, 9:51 PM, "Josh Elser"  wrote:

>That's the one!
>
>FWIW, I had updated
>http://slider.incubator.apache.org/developing/releasing.html#announcing
>when I did this. Let's try to make sure we get this right the next time
>around. :)
>
>Gour Saha wrote:
>> Hi Josh,
>>
>> Is the paragraph below the disclaimer text you are referring to? If
>>yes, then it is totally my bad. I think I referred to the last announce
>>email, and completely missed cross-referencing it with what we have in
>>slider's releasing web page.
>>
>> Apache Slider is an effort undergoing incubation at The Apache Software
>>Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is
>>required of all newly accepted projects until a further review indicates
>>that the infrastructure, communications, and decision making process
>>have stabilized in a manner consistent with other successful ASF
>>projects. While incubation status is not necessarily a reflection of the
>>completeness or stability of the code, it does indicate that the project
>>has yet to be fully endorsed by the ASF.
>>
>> -Gour
>>
>> On 3/30/15, 3:19 PM, "Josh
>>Elser"mailto:els...@apache.org>>  wrote:
>>
>> Gour,
>>
>> This announcement needs to include the incubating project disclaimer. I
>> forgot to add it myself on 0.60 :). Did you happen to copy this text
>> from somewhere in our docs that did not include the disclaimer as well?
>> I had thought I updated the site to include the necessary disclaimer.
>>
>> It already went out now, so it's there's not much to do now, but we do
>> need to make sure this doesn't keep happening.
>>
>> Gour Saha wrote:
>> The Apache Slider team is proud to announce Apache Slider incubation
>>release version 0.70.1-incubating.
>>
>> Apache Slider (incubating) is a YARN application which deploys existing
>>distributed applications on YARN, monitors them, and makes them larger
>>or smaller as desired - even while the application is running.
>>
>> The release artifacts are available at:
>> http://www.apache.org/dyn/closer.cgi/incubator/slider/0.70.1-incubating/
>>
>> To use these artifacts, please use the following documentation:
>> http://slider.incubator.apache.org/docs/getting_started.html
>>
>> We would like to thank all the contributors that made the release
>>possible.
>>
>> Regards,
>> The Slider Team
>>
>>
>>
>>

Re: Slider-develop - Build # 609 - Failure

2015-04-01 Thread Gour Saha

Looking into this. Filed - https://issues.apache.org/jira/browse/SLIDER-837

On 4/1/15, 2:49 PM, "Apache Jenkins Server" 
wrote:

>The Apache Jenkins build system has built Slider-develop (build #609)
>
>Status: Failure
>
>Check console output at https://builds.apache.org/job/Slider-develop/609/
>to view the results.

Re: Finding the host having Slider client

2015-04-03 Thread Gour Saha

Try this URI -
http://:8080/api/v1/clusters//host_components?Ho
stRoles/component_name=SLIDER


Replace the ambari_host and cluster_name

-Gour

On 4/3/15, 8:24 AM, "Sumit Mohanty"  wrote:

>Just forwarded an email to the mailing list where Ambari folks replied to
>a
>similar question.
>
>On Fri, Apr 3, 2015 at 7:19 AM, Krishna Kishore Bonagiri <
>write2kish...@gmail.com> wrote:
>
>> Hi Sumit,
>>We deployed Hadoop using Ambari.
>>
>>  Steve,  I don't know how do labels help. Sitting on one of the
>>machines in
>> the cluster, I want to know which machine is Slider client installed, so
>> that I can start the Slider app on the second machine from the first
>>using
>> ssh or so remotely. Hope you have got my question now.
>>
>> Thanks,
>> Kishore
>>
>> On Fri, Apr 3, 2015 at 7:05 PM, Sumit Mohanty 
>> wrote:
>>
>> > How do you deploy the hadoop cluster?
>> >
>> > On Friday, April 3, 2015, Steve Loughran 
>>wrote:
>> >
>> > >
>> > > > On 3 Apr 2015, at 10:35, Krishna Kishore Bonagiri <
>> > > write2kish...@gmail.com > wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > >  We have developed a slider package, and using it from the Linux
>> shell
>> > > > command, to install, create, etc. I know that it can only be done
>> from
>> > > the
>> > > > node in the cluster where a Slider client is installed. But is
>>there
>> > way
>> > > > sitting on one node in the cluster, where all is Slider client
>> > installed?
>> > > >
>> > >
>> > > could yo maybe do this with labels? label those machines in the
>>cluster
>> > > with everything you need, and set things up to go only there?
>> > >
>> > > >  Please help.
>> > > >
>> > > > Thanks,
>> > > > Kishore
>> > >
>> > >
>> >
>> > --
>> > thanks
>> > Sumit
>> >
>>
>
>
>
>-- 
>thanks
>Sumit

Re: Finding the host having Slider client

2015-04-05 Thread Gour Saha

You can do -
curl  -u admin -p 

Or just access that uri from your browser where you are accessing Ambari
(and logged in also).

-Gour

On 4/5/15, 5:40 PM, "Krishna Kishore Bonagiri" 
wrote:

>Hi Gour,
>  Thanks for the answer. I don't know how to use issue this URI from
>command line (shell), can you please tell me how...
>
>Kishore
>
>On Sat, Apr 4, 2015 at 12:38 AM, Gour Saha  wrote:
>
>> Try this URI -
>> http://
>> :8080/api/v1/clusters//host_components?Ho
>> stRoles/component_name=SLIDER
>>
>>
>> Replace the ambari_host and cluster_name
>>
>> -Gour
>>
>> On 4/3/15, 8:24 AM, "Sumit Mohanty"  wrote:
>>
>> >Just forwarded an email to the mailing list where Ambari folks replied
>>to
>> >a
>> >similar question.
>> >
>> >On Fri, Apr 3, 2015 at 7:19 AM, Krishna Kishore Bonagiri <
>> >write2kish...@gmail.com> wrote:
>> >
>> >> Hi Sumit,
>> >>We deployed Hadoop using Ambari.
>> >>
>> >>  Steve,  I don't know how do labels help. Sitting on one of the
>> >>machines in
>> >> the cluster, I want to know which machine is Slider client
>>installed, so
>> >> that I can start the Slider app on the second machine from the first
>> >>using
>> >> ssh or so remotely. Hope you have got my question now.
>> >>
>> >> Thanks,
>> >> Kishore
>> >>
>> >> On Fri, Apr 3, 2015 at 7:05 PM, Sumit Mohanty
>>
>> >> wrote:
>> >>
>> >> > How do you deploy the hadoop cluster?
>> >> >
>> >> > On Friday, April 3, 2015, Steve Loughran 
>> >>wrote:
>> >> >
>> >> > >
>> >> > > > On 3 Apr 2015, at 10:35, Krishna Kishore Bonagiri <
>> >> > > write2kish...@gmail.com > wrote:
>> >> > > >
>> >> > > > Hi,
>> >> > > >
>> >> > > >  We have developed a slider package, and using it from the
>>Linux
>> >> shell
>> >> > > > command, to install, create, etc. I know that it can only be
>>done
>> >> from
>> >> > > the
>> >> > > > node in the cluster where a Slider client is installed. But is
>> >>there
>> >> > way
>> >> > > > sitting on one node in the cluster, where all is Slider client
>> >> > installed?
>> >> > > >
>> >> > >
>> >> > > could yo maybe do this with labels? label those machines in the
>> >>cluster
>> >> > > with everything you need, and set things up to go only there?
>> >> > >
>> >> > > >  Please help.
>> >> > > >
>> >> > > > Thanks,
>> >> > > > Kishore
>> >> > >
>> >> > >
>> >> >
>> >> > --
>> >> > thanks
>> >> > Sumit
>> >> >
>> >>
>> >
>> >
>> >
>> >--
>> >thanks
>> >Sumit
>>
>>

Re: Need help in starting storm on yarn using slider

2015-04-07 Thread Gour Saha

Can you take a screenshot of your RM UI and send it over? It is usually
available in a URI similar to http://c6410.ambari.apache.org:8088/cluster.
I am specifically interested in seeing the Cluster Metrics table.

-Gour

On 4/7/15, 10:17 AM, "Jon Maron"  wrote:

>
>> On Apr 7, 2015, at 1:14 PM, Jon Maron  wrote:
>> 
>> 
>>> On Apr 7, 2015, at 1:08 PM, Chackravarthy Esakkimuthu
>>> wrote:
>>> 
>>> Thanks for the reply guys!
>>> Contianer allocation happened successfully.
>>> 
>>> *RoleStatus{name='slider-appmaster', key=0, minimum=0, maximum=1,
>>> desired=1, actual=1,*
>>> *RoleStatus{name='STORM_UI_SERVER', key=2, minimum=0, maximum=1,
>>>desired=1,
>>> actual=1, *
>>> *RoleStatus{name='NIMBUS', key=1, minimum=0, maximum=1, desired=1,
>>> actual=1, *
>>> *RoleStatus{name='DRPC_SERVER', key=3, minimum=0, maximum=1, desired=1,
>>> actual=1, *
>>> *RoleStatus{name='SUPERVISOR', key=4, minimum=0, maximum=1, desired=1,
>>> actual=1,*
>>> 
>>> Also, have put some logs specific to a container.. (nimbus) Same set of
>>> logs available for other Roles also (except Supervisor which has only
>>>first
>>> 2 lines of below logs)
>>> 
>>> *Installing NIMBUS on container_e04_1427882795362_0070_01_02.*
>>> *Starting NIMBUS on container_e04_1427882795362_0070_01_02.*
>>> *Registering component container_e04_1427882795362_0070_01_02*
>>> *Requesting applied config for NIMBUS on
>>> container_e04_1427882795362_0070_01_02.*
>>> *Received and processed config for
>>> container_e04_1427882795362_0070_01_02___NIMBUS*
>>> 
>>> Does this result in any intermediate state?
>>> 
>>> @Maron, I didn't configure any port specifically.. do I need to to?
>>>Also, i
>>> don't see any error msg in AM logs wrt port conflict.
>> 
>> My only concern was whether you were actually accession the web UIs at
>>the correct host and port.  If you are then the next step is probably to
>>look at the actual storm/hbase logs.  you can use the ³yarn logs
>>-applicationid ..² command.
>
>*accessing* ;)  
>
>> 
>>> 
>>> Thanks,
>>> Chackra
>>> 
>>> 
>>> 
>>> On Tue, Apr 7, 2015 at 9:02 PM, Jon Maron 
>>>wrote:
>>> 
 
> On Apr 7, 2015, at 11:03 AM, Billie Rinaldi
>
 wrote:
> 
> One thing you can check is whether your system has enough resources
>to
> allocate all the containers the app needs.  You will see info like
>the
> following in the AM log (it will be logged multiple times over the
>life
 of
> the AM).  In this case, the master I requested was allocated but the
> tservers were not.
> RoleStatus{name='ACCUMULO_TSERVER', key=2, desired=2, actual=0,
> requested=2, releasing=0, failed=0, started=0, startFailed=0,
 completed=0,
> failureMessage=''}
> RoleStatus{name='ACCUMULO_MASTER', key=1, desired=1, actual=1,
 requested=0,
> releasing=0, failed=0, started=0, startFailed=0, completed=0,
> failureMessage=Œ'}
 
 You can also check the ³Scheduler² link on the RM Web UI to get a
sense of
 whether you are resource constrained.
 
 Are you certain that you are attempting to invoke the correct port?
The
 listening ports are dynamically allocated by Slider.
 
> 
> 
> On Tue, Apr 7, 2015 at 3:29 AM, Chackravarthy Esakkimuthu <
> chaku.mi...@gmail.com> wrote:
> 
>> Hi All,
>> 
>> I am new to Apache slider and would like to contribute.
>> 
>> Just to start with, I am trying out running "storm" and  "hbase" on
>>yarn
>> using slider following the guide :
>> 
>> 
>> 
 
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/run
ning_applications_on_slider/index.html#Item1.1
>> 
>> In both (storm and hbase) the cases, the ApplicationMaster gets
>>launched
>> and still running, but the ApplicationMaster link not working, and
>>from
 AM
>> logs, I don't see any errors.
>> 
>> How do I debug from this? Please help me.
>> Incase if there is any other mail thread with respect this, please
>>point
>> out to me. Thanks in advance.
>> 
>> Thanks,
>> Chackra
>> 
 
 
>> 
>

Re: Need help in starting storm on yarn using slider

2015-04-07 Thread Gour Saha

Chackra sent the attachment directly to me. From what I see the cluster 
resources (memory and cores) are abundant. 

But I also see that only 1 app is running which is the one we are trying to 
debug and 5 containers are running. So definitely more containers that just the 
AM is running. 

Can you click on the app master link and copy paste the content of that page? 
No need for screen shot. Also please send your resources JSON file. 

-Gour

- Sent from my iPhone

> On Apr 7, 2015, at 11:01 AM, "Jon Maron"  wrote:
> 
> 
> On Apr 7, 2015, at 1:36 PM, Chackravarthy Esakkimuthu 
> mailto:chaku.mi...@gmail.com>> wrote:
> 
> @Maron, I could not get the logs even though the application is still running.
> It's a 10 node cluster and I logged into one of the node and executed the 
> command :
> 
> sudo -u hdfs yarn logs -applicationId application_1427882795362_0070
> 15/04/07 22:56:09 INFO impl.TimelineClientImpl: Timeline service address: 
> http://$HOST:PORT/ws/v1/timeline/
> 15/04/07 22:56:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> /app-logs/hdfs/logs/application_1427882795362_0070does not have any log files.
> 
> Can you login to the cluster node and look at the logs directory (e.g. in HDP 
> install it would be under /hadoop/yarn/logs IIRC)?
> 
> 
> 
> @Gour, Please find the attachment.
> 
> On Tue, Apr 7, 2015 at 10:57 PM, Gour Saha 
> mailto:gs...@hortonworks.com>> wrote:
> Can you take a screenshot of your RM UI and send it over? It is usually
> available in a URI similar to http://c6410.ambari.apache.org:8088/cluster.
> I am specifically interested in seeing the Cluster Metrics table.
> 
> -Gour
> 
>> On 4/7/15, 10:17 AM, "Jon Maron" 
>> mailto:jma...@hortonworks.com>> wrote:
>> 
>> 
>>> On Apr 7, 2015, at 1:14 PM, Jon Maron 
>>> mailto:jma...@hortonworks.com>> wrote:
>>> 
>>> 
>>>> On Apr 7, 2015, at 1:08 PM, Chackravarthy Esakkimuthu
>>>> mailto:chaku.mi...@gmail.com>> wrote:
>>>> 
>>>> Thanks for the reply guys!
>>>> Contianer allocation happened successfully.
>>>> 
>>>> *RoleStatus{name='slider-appmaster', key=0, minimum=0, maximum=1,
>>>> desired=1, actual=1,*
>>>> *RoleStatus{name='STORM_UI_SERVER', key=2, minimum=0, maximum=1,
>>>> desired=1,
>>>> actual=1, *
>>>> *RoleStatus{name='NIMBUS', key=1, minimum=0, maximum=1, desired=1,
>>>> actual=1, *
>>>> *RoleStatus{name='DRPC_SERVER', key=3, minimum=0, maximum=1, desired=1,
>>>> actual=1, *
>>>> *RoleStatus{name='SUPERVISOR', key=4, minimum=0, maximum=1, desired=1,
>>>> actual=1,*
>>>> 
>>>> Also, have put some logs specific to a container.. (nimbus) Same set of
>>>> logs available for other Roles also (except Supervisor which has only
>>>> first
>>>> 2 lines of below logs)
>>>> 
>>>> *Installing NIMBUS on container_e04_1427882795362_0070_01_02.*
>>>> *Starting NIMBUS on container_e04_1427882795362_0070_01_02.*
>>>> *Registering component container_e04_1427882795362_0070_01_02*
>>>> *Requesting applied config for NIMBUS on
>>>> container_e04_1427882795362_0070_01_02.*
>>>> *Received and processed config for
>>>> container_e04_1427882795362_0070_01_02___NIMBUS*
>>>> 
>>>> Does this result in any intermediate state?
>>>> 
>>>> @Maron, I didn't configure any port specifically.. do I need to to?
>>>> Also, i
>>>> don't see any error msg in AM logs wrt port conflict.
>>> 
>>> My only concern was whether you were actually accession the web UIs at
>>> the correct host and port.  If you are then the next step is probably to
>>> look at the actual storm/hbase logs.  you can use the ³yarn logs
>>> -applicationid ..² command.
>> 
>> *accessing* ;)
>> 
>>> 
>>>> 
>>>> Thanks,
>>>> Chackra
>>>> 
>>>> 
>>>> 
>>>> On Tue, Apr 7, 2015 at 9:02 PM, Jon Maron 
>>>> mailto:jma...@hortonworks.com>>
>>>> wrote:
>>>> 
>>>>> 
>>>>>> On Apr 7, 2015, at 11:03 AM, Billie Rinaldi
>>>>>> mailto:billie.rina...@gmail.com>>
>>>>> wrote:
>>>>>> 
>>>>>> One thing you can check is whether your system has enough resources
>>>>

Re: Need help in starting storm on yarn using slider

2015-04-07 Thread Gour Saha

Sorry forgot that the AM link not working was the original issue.

Few more things -
- Seems like you have RM HA setup, right?
- Can you copy paste the complete link of the RM UI and the URL of
ApplicationMaster (the link which is broken) with actual hostnames?


-Gour

On 4/7/15, 11:43 AM, "Chackravarthy Esakkimuthu" 
wrote:

>Since 5 containers are running, which means that Storm daemons are already
>up and running?
>
>
>Actually the ApplicationMaster link is not working. It just blanks out
>printing the following :
>
>This is standby RM. Redirecting to the current active RM:
>http://:8088/proxy/application_1427882795362_0070/slideram
>
>
>And for resources.json, I dint make any change and used the copy of
>resources-default.json as follows:
>
>
>{
>
>  "schema" : "http://example.org/specification/v2.0.0";,
>
>  "metadata" : {
>
>  },
>
>  "global" : {
>
>"yarn.log.include.patterns": "",
>
>"yarn.log.exclude.patterns": ""
>
>  },
>
>  "components": {
>
>"slider-appmaster": {
>
>  "yarn.memory": "512"
>
>},
>
>"NIMBUS": {
>
>  "yarn.role.priority": "1",
>
>  "yarn.component.instances": "1",
>
>  "yarn.memory": "2048"
>
>},
>
>"STORM_UI_SERVER": {
>
>  "yarn.role.priority": "2",
>
>  "yarn.component.instances": "1",
>
>  "yarn.memory": "1278"
>
>},
>
>"DRPC_SERVER": {
>
>  "yarn.role.priority": "3",
>
>  "yarn.component.instances": "1",
>
>  "yarn.memory": "1278"
>
>},
>
>"SUPERVISOR": {
>
>  "yarn.role.priority": "4",
>
>  "yarn.component.instances": "1",
>
>  "yarn.memory": "3072"
>
>}
>
>  }
>
>}
>
>
>
>On Tue, Apr 7, 2015 at 11:52 PM, Gour Saha  wrote:
>
>> Chackra sent the attachment directly to me. From what I see the cluster
>> resources (memory and cores) are abundant.
>>
>> But I also see that only 1 app is running which is the one we are trying
>> to debug and 5 containers are running. So definitely more containers
>>that
>> just the AM is running.
>>
>> Can you click on the app master link and copy paste the content of that
>> page? No need for screen shot. Also please send your resources JSON
>>file.
>>
>> -Gour
>>
>> - Sent from my iPhone
>>
>> > On Apr 7, 2015, at 11:01 AM, "Jon Maron" 
>>wrote:
>> >
>> >
>> > On Apr 7, 2015, at 1:36 PM, Chackravarthy Esakkimuthu <
>> chaku.mi...@gmail.com<mailto:chaku.mi...@gmail.com>> wrote:
>> >
>> > @Maron, I could not get the logs even though the application is still
>> running.
>> > It's a 10 node cluster and I logged into one of the node and executed
>> the command :
>> >
>> > sudo -u hdfs yarn logs -applicationId application_1427882795362_0070
>> > 15/04/07 22:56:09 INFO impl.TimelineClientImpl: Timeline service
>> address: http://$HOST:PORT/ws/v1/timeline/
>> > 15/04/07 22:56:09 INFO client.ConfiguredRMFailoverProxyProvider:
>>Failing
>> over to rm2
>> > /app-logs/hdfs/logs/application_1427882795362_0070does not have any
>>log
>> files.
>> >
>> > Can you login to the cluster node and look at the logs directory (e.g.
>> in HDP install it would be under /hadoop/yarn/logs IIRC)?
>> >
>> >
>> >
>> > @Gour, Please find the attachment.
>> >
>> > On Tue, Apr 7, 2015 at 10:57 PM, Gour Saha > <mailto:gs...@hortonworks.com>> wrote:
>> > Can you take a screenshot of your RM UI and send it over? It is
>>usually
>> > available in a URI similar to
>> http://c6410.ambari.apache.org:8088/cluster.
>> > I am specifically interested in seeing the Cluster Metrics table.
>> >
>> > -Gour
>> >
>> >> On 4/7/15, 10:17 AM, "Jon Maron" > jma...@hortonworks.com>> wrote:
>> >>
>> >>
>> >>> On Apr 7, 2015, at 1:14 PM, Jon Maron
>>> jma...@hortonworks.com>> wrote:
>> >>>
>> >>>
>> >>>> On Apr 7, 2015, at 1:08 PM, Chackravarthy Esakkimuthu
>> &g

Re: Need help in starting storm on yarn using slider

2015-04-07 Thread Gour Saha

Which user are you running the slider create command as? Seems like you
are running as hdfs user. Is this a secured cluster?

-Gour

On 4/7/15, 1:06 PM, "Chackravarthy Esakkimuthu" 
wrote:

>yes, RM HA has been setup in this cluster.
>
>Active : zs-aaa-001.nm.flipkart.com
>Standby : zs-aaa-002.nm.flipkart.com
>
>RM Link : http://zs-aaa-001.nm.flipkart.com:8088/cluster/scheduler
><http://zs-exp-01.nm.flipkart.com:8088/cluster/scheduler>
>
>AM Link :
>http://zs-aaa-001.nm.flipkart.com:8088/proxy/application_1427882795362_007
>0/slideram
><http://zs-exp-01.nm.flipkart.com:8088/proxy/application_1427882795362_007
>0/slideram>
>
>On Wed, Apr 8, 2015 at 1:05 AM, Gour Saha  wrote:
>
>> Sorry forgot that the AM link not working was the original issue.
>>
>> Few more things -
>> - Seems like you have RM HA setup, right?
>> - Can you copy paste the complete link of the RM UI and the URL of
>> ApplicationMaster (the link which is broken) with actual hostnames?
>>
>>
>> -Gour
>>
>> On 4/7/15, 11:43 AM, "Chackravarthy Esakkimuthu" 
>> wrote:
>>
>> >Since 5 containers are running, which means that Storm daemons are
>>already
>> >up and running?
>> >
>> >
>> >Actually the ApplicationMaster link is not working. It just blanks out
>> >printing the following :
>> >
>> >This is standby RM. Redirecting to the current active RM:
>> >http://:8088/proxy/application_1427882795362_0070/slideram
>> >
>> >
>> >And for resources.json, I dint make any change and used the copy of
>> >resources-default.json as follows:
>> >
>> >
>> >{
>> >
>> >  "schema" : "http://example.org/specification/v2.0.0";,
>> >
>> >  "metadata" : {
>> >
>> >  },
>> >
>> >  "global" : {
>> >
>> >"yarn.log.include.patterns": "",
>> >
>> >"yarn.log.exclude.patterns": ""
>> >
>> >  },
>> >
>> >  "components": {
>> >
>> >"slider-appmaster": {
>> >
>> >  "yarn.memory": "512"
>> >
>> >},
>> >
>> >"NIMBUS": {
>> >
>> >  "yarn.role.priority": "1",
>> >
>> >  "yarn.component.instances": "1",
>> >
>> >  "yarn.memory": "2048"
>> >
>> >},
>> >
>> >"STORM_UI_SERVER": {
>> >
>> >  "yarn.role.priority": "2",
>> >
>> >  "yarn.component.instances": "1",
>> >
>> >  "yarn.memory": "1278"
>> >
>> >},
>> >
>> >"DRPC_SERVER": {
>> >
>> >  "yarn.role.priority": "3",
>> >
>> >  "yarn.component.instances": "1",
>> >
>> >  "yarn.memory": "1278"
>> >
>> >},
>> >
>> >"SUPERVISOR": {
>> >
>> >  "yarn.role.priority": "4",
>> >
>> >  "yarn.component.instances": "1",
>> >
>> >  "yarn.memory": "3072"
>> >
>> >}
>> >
>> >  }
>> >
>> >}
>> >
>> >
>> >
>> >On Tue, Apr 7, 2015 at 11:52 PM, Gour Saha 
>>wrote:
>> >
>> >> Chackra sent the attachment directly to me. From what I see the
>>cluster
>> >> resources (memory and cores) are abundant.
>> >>
>> >> But I also see that only 1 app is running which is the one we are
>>trying
>> >> to debug and 5 containers are running. So definitely more containers
>> >>that
>> >> just the AM is running.
>> >>
>> >> Can you click on the app master link and copy paste the content of
>>that
>> >> page? No need for screen shot. Also please send your resources JSON
>> >>file.
>> >>
>> >> -Gour
>> >>
>> >> - Sent from my iPhone
>> >>
>> >> > On Apr 7, 2015, at 11:01 AM, "Jon Maron" 
>> >>wrote:
>> >> >
>> >> >
>> >> > On Apr 7, 2

Re: Need help in starting storm on yarn using slider

2015-04-07 Thread Gour Saha

In a non-secured cluster you should run as yarn. Can you do that and let
us know how it goes?

Also you can stop your existing storm instance in hdfs user (run as hdfs
user) by running stop first -
slider stop storm1

-Gour

On 4/7/15, 1:39 PM, "Chackravarthy Esakkimuthu" 
wrote:

>This is not a secured cluster.
>And yes, I used 'hdfs' user while running slider create.
>
>On Wed, Apr 8, 2015 at 2:03 AM, Gour Saha  wrote:
>
>> Which user are you running the slider create command as? Seems like you
>> are running as hdfs user. Is this a secured cluster?
>>
>> -Gour
>>
>> On 4/7/15, 1:06 PM, "Chackravarthy Esakkimuthu" 
>> wrote:
>>
>> >yes, RM HA has been setup in this cluster.
>> >
>> >Active : zs-aaa-001.nm.flipkart.com
>> >Standby : zs-aaa-002.nm.flipkart.com
>> >
>> >RM Link : http://zs-aaa-001.nm.flipkart.com:8088/cluster/scheduler
>> ><http://zs-exp-01.nm.flipkart.com:8088/cluster/scheduler>
>> >
>> >AM Link :
>> >
>> 
>>http://zs-aaa-001.nm.flipkart.com:8088/proxy/application_1427882795362_00
>>7
>> >0/slideram
>> ><
>> 
>>http://zs-exp-01.nm.flipkart.com:8088/proxy/application_1427882795362_007
>> >0/slideram>
>> >
>> >On Wed, Apr 8, 2015 at 1:05 AM, Gour Saha 
>>wrote:
>> >
>> >> Sorry forgot that the AM link not working was the original issue.
>> >>
>> >> Few more things -
>> >> - Seems like you have RM HA setup, right?
>> >> - Can you copy paste the complete link of the RM UI and the URL of
>> >> ApplicationMaster (the link which is broken) with actual hostnames?
>> >>
>> >>
>> >> -Gour
>> >>
>> >> On 4/7/15, 11:43 AM, "Chackravarthy Esakkimuthu"
>>> >
>> >> wrote:
>> >>
>> >> >Since 5 containers are running, which means that Storm daemons are
>> >>already
>> >> >up and running?
>> >> >
>> >> >
>> >> >Actually the ApplicationMaster link is not working. It just blanks
>>out
>> >> >printing the following :
>> >> >
>> >> >This is standby RM. Redirecting to the current active RM:
>> >> 
>>>http://:8088/proxy/application_1427882795362_0070/slideram
>> >> >
>> >> >
>> >> >And for resources.json, I dint make any change and used the copy of
>> >> >resources-default.json as follows:
>> >> >
>> >> >
>> >> >{
>> >> >
>> >> >  "schema" : "http://example.org/specification/v2.0.0";,
>> >> >
>> >> >  "metadata" : {
>> >> >
>> >> >  },
>> >> >
>> >> >  "global" : {
>> >> >
>> >> >"yarn.log.include.patterns": "",
>> >> >
>> >> >"yarn.log.exclude.patterns": ""
>> >> >
>> >> >  },
>> >> >
>> >> >  "components": {
>> >> >
>> >> >"slider-appmaster": {
>> >> >
>> >> >  "yarn.memory": "512"
>> >> >
>> >> >},
>> >> >
>> >> >"NIMBUS": {
>> >> >
>> >> >  "yarn.role.priority": "1",
>> >> >
>> >> >  "yarn.component.instances": "1",
>> >> >
>> >> >  "yarn.memory": "2048"
>> >> >
>> >> >},
>> >> >
>> >> >"STORM_UI_SERVER": {
>> >> >
>> >> >  "yarn.role.priority": "2",
>> >> >
>> >> >  "yarn.component.instances": "1",
>> >> >
>> >> >  "yarn.memory": "1278"
>> >> >
>> >> >},
>> >> >
>> >> >"DRPC_SERVER": {
>> >> >
>> >> >  "yarn.role.priority": "3",
>> >> >
>> >> >  "yarn.component.instances": "1",
>> >> >
>> >> >  "yarn.memory": "1278"
>> >> >
>> >&

Re: Need help in starting storm on yarn using slider

2015-04-08 Thread Gour Saha

Jon was right. I think Storm uses ${USER_NAME} for app_user instead of hard 
coding as yarn unlike hbase. So either users were fine. 

One thing I saw in the AM and RM urls is that they link to 
zs-aaa-001.nm.flipkart.com and zs-exp-01.nm.flipkart.com. Can you hand edit the 
AM URL to try both the host aliases?

I am not sure if the above will work in which case if you could send the entire 
AM logs then it would be great. 

-Gour

- Sent from my iPhone

> On Apr 7, 2015, at 11:08 PM, "Chackravarthy Esakkimuthu" 
>  wrote:
> 
> Tried running with 'yarn' user, but it remains in same state.
> AM link not working, and AM logs are similar.
> 
> On Wed, Apr 8, 2015 at 2:14 AM, Gour Saha  wrote:
> 
>> In a non-secured cluster you should run as yarn. Can you do that and let
>> us know how it goes?
>> 
>> Also you can stop your existing storm instance in hdfs user (run as hdfs
>> user) by running stop first -
>> slider stop storm1
>> 
>> -Gour
>> 
>> On 4/7/15, 1:39 PM, "Chackravarthy Esakkimuthu" 
>> wrote:
>> 
>>> This is not a secured cluster.
>>> And yes, I used 'hdfs' user while running slider create.
>>> 
>>>> On Wed, Apr 8, 2015 at 2:03 AM, Gour Saha  wrote:
>>>> 
>>>> Which user are you running the slider create command as? Seems like you
>>>> are running as hdfs user. Is this a secured cluster?
>>>> 
>>>> -Gour
>>>> 
>>>> On 4/7/15, 1:06 PM, "Chackravarthy Esakkimuthu" 
>>>> wrote:
>>>> 
>>>>> yes, RM HA has been setup in this cluster.
>>>>> 
>>>>> Active : zs-aaa-001.nm.flipkart.com
>>>>> Standby : zs-aaa-002.nm.flipkart.com
>>>>> 
>>>>> RM Link : http://zs-aaa-001.nm.flipkart.com:8088/cluster/scheduler
>>>>> <http://zs-exp-01.nm.flipkart.com:8088/cluster/scheduler>
>>>>> 
>>>>> AM Link :
>> http://zs-aaa-001.nm.flipkart.com:8088/proxy/application_1427882795362_00
>>>> 7
>>>>> 0/slideram
>>>>> <
>> http://zs-exp-01.nm.flipkart.com:8088/proxy/application_1427882795362_007
>>>>> 0/slideram>
>>>>> 
>>>>>> On Wed, Apr 8, 2015 at 1:05 AM, Gour Saha 
>>>>> wrote:
>>>>> 
>>>>>> Sorry forgot that the AM link not working was the original issue.
>>>>>> 
>>>>>> Few more things -
>>>>>> - Seems like you have RM HA setup, right?
>>>>>> - Can you copy paste the complete link of the RM UI and the URL of
>>>>>> ApplicationMaster (the link which is broken) with actual hostnames?
>>>>>> 
>>>>>> 
>>>>>> -Gour
>>>>>> 
>>>>>> On 4/7/15, 11:43 AM, "Chackravarthy Esakkimuthu"
>>>> >>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Since 5 containers are running, which means that Storm daemons are
>>>>>> already
>>>>>>> up and running?
>>>>>>> 
>>>>>>> 
>>>>>>> Actually the ApplicationMaster link is not working. It just blanks
>>>> out
>>>>>>> printing the following :
>>>>>>> 
>>>>>>> This is standby RM. Redirecting to the current active RM:
>>>>> http://:8088/proxy/application_1427882795362_0070/slideram
>>>>>>> 
>>>>>>> 
>>>>>>> And for resources.json, I dint make any change and used the copy of
>>>>>>> resources-default.json as follows:
>>>>>>> 
>>>>>>> 
>>>>>>> {
>>>>>>> 
>>>>>>> "schema" : "http://example.org/specification/v2.0.0";,
>>>>>>> 
>>>>>>> "metadata" : {
>>>>>>> 
>>>>>>> },
>>>>>>> 
>>>>>>> "global" : {
>>>>>>> 
>>>>>>>   "yarn.log.include.patterns": "",
>>>>>>> 
>>>>>>>   "yarn.log.exclude.patterns": ""
>>>>>>> 
>>>>>>> },
>>>>>>> 
>>>>>>> "components": {
>>>>>>> 
>>>>>>>

Re: Need help in starting storm on yarn using slider

2015-04-08 Thread Gour Saha

Chackra,

We believe you are running into redirection issue when RM HA is setup -
https://issues.apache.org/jira/browse/YARN-1525

https://issues.apache.org/jira/browse/YARN-1811


These were fixed in Hadoop 2.6 (the version you have). But we still found
issues with Slider AM UI in Slider version 0.60 (the version you are
using) on top of Hadoop 2.6.


I thought we filed a JIRA on it, but could not find any. I went ahead and
filed one now -
https://issues.apache.org/jira/browse/SLIDER-846



Workaround -
Is this a production cluster? If not, can you disable RM HA and check if
you can access the AM UI and also run all slider command lines
successfully? This is a basic test to make ensure that this is indeed
happening because of RM HA setup.

Once we verify the above revert back to RM HA again. I think we can make
the Slider AM UI work in the RM HA setup by doing this (we haven’t tested
this so not 100% sure it will work) -

In the RM HA setup we can use YARN labels and constrain the Slider AM to
come up in the active RM node. Let me know if you want to try this route
and I would be happy to help you out with details on how to set this up.


-Gour

On 4/8/15, 9:17 AM, "Chackravarthy Esakkimuthu" 
wrote:

>No, iptables is not enabled i think. (will confirm)
>But, AM is running, even other containers are running and I could see
>storm/hbase daemons running in those nodes.
>Does this mean installation is successful? How do I check the status of
>the
>installation?
>
>Tried using slider command with no success, (Please let me know if am I
>using it wrongly)
>- storm-yarn-1 and hb1 are the names which I used to for "slider create"
>command.
>
>/usr/hdp/current/slider-client/bin/./slider status *storm-yarn-1*
>2015-04-08 21:40:17,178 [main] INFO  impl.TimelineClientImpl - Timeline
>service address: http://host2:8188/ws/v1/timeline/
>2015-04-08 21:40:17,782 [main] WARN  shortcircuit.DomainSocketFactory -
>The
>short-circuit local reads feature cannot be used because libhadoop cannot
>be loaded.
>2015-04-08 21:40:17,936 [main] INFO
> client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
>2015-04-08 21:40:17,970 [main] ERROR main.ServiceLauncher - *Unknown
>application instance : storm-yarn-1*
>2015-04-08 21:40:17,971 [main] INFO  util.ExitUtil - Exiting with status
>69
>
>/usr/hdp/current/slider-client/bin/./slider status *hb1*
>2015-04-08 21:40:31,344 [main] INFO  impl.TimelineClientImpl - Timeline
>service address: http://host2:8188/ws/v1/timeline/
>2015-04-08 21:40:32,075 [main] WARN  shortcircuit.DomainSocketFactory -
>The
>short-circuit local reads feature cannot be used because libhadoop cannot
>be loaded.
>2015-04-08 21:40:32,263 [main] INFO
> client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
>2015-04-08 21:40:32,306 [main] ERROR main.ServiceLauncher - *Unknown
>application instance : hb1*
>2015-04-08 21:40:32,308 [main] INFO  util.ExitUtil - Exiting with status
>69
>
>
>On Wed, Apr 8, 2015 at 7:14 PM, Jon Maron  wrote:
>
>> Indications seem to be that the AM is started but the AM URI you’re
>> attempting to attach to may be mistaken or there may be something
>> preventing the actual connection.  Any chance iptables is enabled?
>>
>>
>> > On Apr 8, 2015, at 3:44 AM, Gour Saha  wrote:
>> >
>> > Jon was right. I think Storm uses ${USER_NAME} for app_user instead of
>> hard coding as yarn unlike hbase. So either users were fine.
>> >
>> > One thing I saw in the AM and RM urls is that they link to
>> zs-aaa-001.nm.flipkart.com and zs-exp-01.nm.flipkart.com. Can you hand
>> edit the AM URL to try both the host aliases?
>> >
>> > I am not sure if the above will work in which case if you could send
>>the
>> entire AM logs then it would be great.
>> >
>> > -Gour
>> >
>> > - Sent from my iPhone
>> >
>> >> On Apr 7, 2015, at 11:08 PM, "Chackravarthy Esakkimuthu" <
>> chaku.mi...@gmail.com> wrote:
>> >>
>> >> Tried running with 'yarn' user, but it remains in same state.
>> >> AM link not working, and AM logs are similar.
>> >>
>> >> On Wed, Apr 8, 2015 at 2:14 AM, Gour Saha 
>> wrote:
>> >>
>> >>> In a non-secured cluster you should run as yarn. Can you do that and
>> let
>> >>> us know how it goes?
>> >>>
>> >>> Also you can stop your existing storm instance in hdfs user (run as
>> hdfs
>> >>> user) by running stop first -
>> >>> slider stop storm1
>> >>>
>> >>> -Gour
>> >>>
>> >>>

Re: Need help in starting storm on yarn using slider

2015-04-09 Thread Gour Saha

You don¹t need to do su . There is an option --user which you can
provide on the command line. storm-slider script is a wrapper on top of
slider command line. It has a syntax mapping issue. You can use slider
command line directly to do the same and much much more.

For e.g. Running this fetches you the quicklinks for a specific user.

slider registry --name storm2 --user yarn --getexp quicklinks

I will file a bug on the storm-slider script.

-Gour

On 4/9/15, 8:53 AM, "Chackravarthy Esakkimuthu" 
wrote:

>sure, in this case may be the following doc can be updated to use 'su
>' of respective user for these commands. Because all other commands
>have clear 'usage' template.
>
>http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/runnin
>g_applications_on_slider/index.html#Item1.1
>
>On Thu, Apr 9, 2015 at 9:10 PM, Jon Maron  wrote:
>
>> Aside from the yarn issue you discovered wrt HA, if you have any
>> recommendations for usability/diagnosability, please feel free to let us
>> know or file JIRAs (e.g.  perhaps the error message below should add
>> ³Please make sure you are logged in as the application owner² :) )
>>
>> ‹ Jon
>>
>> > On Apr 9, 2015, at 11:29 AM, Chackravarthy Esakkimuthu <
>> chaku.mi...@gmail.com> wrote:
>> >
>> > Thanks steve, I was running with 'yarn' user while creating storm
>> > application on YARN, but forgot to run as 'yarn' user while checking
>>the
>> > application status.
>> > And yeah, connected to zookeeper and checked under "/registry/users".
>>(as
>> > well as the way you suggested)
>> >
>> > Thanks all, Now I could able to submit sample topology on deployed
>>storm
>> > also. :)
>> >
>> > I will try other actions like stopping/killing the running instance.
>> Thanks
>> > guys again!!
>> >
>> >
>> > On Thu, Apr 9, 2015 at 7:03 PM, Steve Loughran
>>
>> > wrote:
>> >
>> >>
>> >> On 9 Apr 2015, at 12:51, Chackravarthy Esakkimuthu <
>> chaku.mi...@gmail.com
>> >> > wrote:
>> >>
>> >> *2015-04-09 17:14:44,667 [main] ERROR main.ServiceLauncher -
>> >> /registry/users/chackaravarthy.e/services/org-apache-slider/storm2*
>> >> *2015-04-09 17:14:44,671 [main] INFO  util.ExitUtil - Exiting with
>> status
>> >> 44*
>> >>
>> >>
>> >> 44 is our exit code "not found" (== 404); it's saying the registry
>>entry
>> >> did not exist in zookeeper
>> >>
>> >> Looking at the path, I worry about the username chackaravarthy.e ;
>>maybe
>> >> its registering under a different user.
>> >>
>> >>
>> >>  1.  Can you get to the Slider AM page via the RM?
>> >>  2.  Can you then look at the listing of exported URLs there?
>>Especially
>> >> one "registry".
>> >>  3.  Click on that and you can then browse a JSON view of the
>>registry;
>> >>  4.  add /users to the end of the path to get the listing of all
>>users,
>> >> /users/chackaravarthy.e/  to see the services you have under you ...
>>you
>> >> should be able to continue down to see if there is a storm2 service
>> entry
>> >>
>> >>
>> >>
>>
>>

Re: Need help in starting storm on yarn using slider

2015-04-09 Thread Gour Saha

Awesome. Happy to get you up and running.

So I presume you were able to get the JSON dump for the quicklinks command
line. This is what I get in my local vm.

{
  "org.apache.slider.jmx" :
"http://c6410.ambari.apache.org:56329/api/v1/cluster/summary";,
  "org.apache.slider.metrics" :
"http://c6410.ambari.apache.org:6188/ws/v1/timeline/metrics";,
  "nimbus.host_port" : "http://c6410.ambari.apache.org:43987";,
  "org.apache.slider.monitor" : "http://c6410.ambari.apache.org:56329";
}


You can go to the link of the property org.apache.slider.monitor and get
to the Storm UI. The UI does throw some exception if you access it early
before Storm instances come up. But keep refreshing and the page will show
up.

By the way did you use Ambari? If yes, then you can use Ambari Views for
Slider and deploy apps in a much faster, cleaner and visual way. It also
provides links in the Slider View UI to the quicklinks once your app is
running. You can also enable metrics in the UI and get some basic graphs
which can be extended if required.


-Gour

On 4/9/15, 8:29 AM, "Chackravarthy Esakkimuthu" 
wrote:

>Thanks steve, I was running with 'yarn' user while creating storm
>application on YARN, but forgot to run as 'yarn' user while checking the
>application status.
>And yeah, connected to zookeeper and checked under "/registry/users". (as
>well as the way you suggested)
>
>Thanks all, Now I could able to submit sample topology on deployed storm
>also. :)
>
>I will try other actions like stopping/killing the running instance.
>Thanks
>guys again!!
>
>
>On Thu, Apr 9, 2015 at 7:03 PM, Steve Loughran 
>wrote:
>
>>
>> On 9 Apr 2015, at 12:51, Chackravarthy Esakkimuthu
>>> > wrote:
>>
>> *2015-04-09 17:14:44,667 [main] ERROR main.ServiceLauncher -
>> /registry/users/chackaravarthy.e/services/org-apache-slider/storm2*
>> *2015-04-09 17:14:44,671 [main] INFO  util.ExitUtil - Exiting with
>>status
>> 44*
>>
>>
>> 44 is our exit code "not found" (== 404); it's saying the registry entry
>> did not exist in zookeeper
>>
>> Looking at the path, I worry about the username chackaravarthy.e ; maybe
>> its registering under a different user.
>>
>>
>>   1.  Can you get to the Slider AM page via the RM?
>>   2.  Can you then look at the listing of exported URLs there?
>>Especially
>> one "registry".
>>   3.  Click on that and you can then browse a JSON view of the registry;
>>   4.  add /users to the end of the path to get the listing of all users,
>> /users/chackaravarthy.e/  to see the services you have under you ... you
>> should be able to continue down to see if there is a storm2 service
>>entry
>>
>>
>>

Re: Invalid port 0 for storm instances

2015-04-11 Thread Gour Saha

There were no issues. The variable was renamed to be more user friendly.

-Gour

On 4/10/15, 3:48 PM, "Nitin Aggarwal"  wrote:

>My mistake, we are running slider version 0.50. I believe these configs
>were changed in 0.60 version.
>Also, were there any issues around port allocation that could fix this
>issue, so that I can back-port them to 0.50 ?
>
>On Wed, Apr 1, 2015 at 3:19 PM, Sumit Mohanty 
>wrote:
>
>> "${SUPERVISOR.ALLOCATED_PORT}{DO_NOT_PROPAGATE}",
>>
>> should be changed to
>>
>> "${SUPERVISOR.ALLOCATED_PORT}{PER_CONTAINER}",
>>
>> As a reference -
>> 
>>https://github.com/apache/incubator-slider/blob/develop/app-packages/stor
>>m/appConfig-default.json
>>
>> I think even DEF_ZK_PATH should be DEFAULT_ZK_PATH
>> 
>> From: Nitin Aggarwal 
>> Sent: Monday, March 30, 2015 10:38 AM
>> To: dev@slider.incubator.apache.org
>> Subject: Re: Invalid port 0 for storm instances
>>
>> Yes, storm package is built internally.
>>
>> App configuration:
>>
>> "appConf" :{
>>   "schema" : "http://example.org/specification/v2.0.0";,
>>   "metadata" : { },
>>   "global" : {
>> "agent.conf" : "/apps/slider/agent/conf/agent.ini",
>> "application.def" :
>>"/apps/slider/app-packages/storm/storm_v0_9_4.zip",
>> "config_types" : "storm-site",
>> "create.default.zookeeper.node" : "true",
>> "env.MALLOC_ARENA_MAX" : "4",
>> "java_home" : "/usr/java/jdk1.7.0_40",
>> "package_list" : "files/storm-0.9.4-SNAPSHOT-bin.tar.gz",
>> "site.fs.default.name" : "hdfs://X/",
>> "site.fs.defaultFS" : "hdfs://XX:8020/",
>> "site.global.app_install_dir" : "${AGENT_WORK_ROOT}/app/install",
>> "site.global.app_log_dir" : "/srv/var/hadoop/logs/deathstar",
>> "site.global.app_pid_dir" : "${AGENT_WORK_ROOT}/app/run",
>> "site.global.app_root" :
>> "${AGENT_WORK_ROOT}/app/install/apache-storm-0.9.4-SNAPSHOT",
>> "site.global.app_user" : "yarn",
>> "site.global.ganglia_enabled" : "false",
>> "site.global.ganglia_server_host" : "${NN_HOST}",
>> "site.global.ganglia_server_id" : "Application2",
>> "site.global.ganglia_server_port" : "8668",
>> "site.global.hbase_instance_name" : "XX",
>> "site.global.opentsdb_server_host" : "X",
>> "site.global.opentsdb_server_port" : "4242",
>> "site.global.rest_api_admin_port" :
>>"${STORM_REST_API.ALLOCATED_PORT}",
>> "site.global.rest_api_port" : "${STORM_REST_API.ALLOCATED_PORT}",
>> "site.global.security_enabled" : "false",
>> "site.global.storm_instance_name" : "X",
>> "site.global.user_group" : "hadoop",
>> "site.storm-site.dev.zookeeper.path" :
>> "${AGENT_WORK_ROOT}/app/tmp/dev-storm-zookeeper",
>> "site.storm-site.drpc.childopts" : "-Xmx768m",
>> "site.storm-site.drpc.invocations.port" : "0",
>> "site.storm-site.drpc.port" : "0",
>> "site.storm-site.drpc.queue.size" : "128",
>> "site.storm-site.drpc.request.timeout.secs" : "600",
>> "site.storm-site.drpc.worker.threads" : "64",
>> "site.storm-site.java.library.path" :
>>
>> 
>>"/etc/hadoop/conf:/usr/lib/hadoop/lib/native:/usr/local/lib:/opt/local/li
>>b:/usr/lib",
>> "site.storm-site.logviewer.appender.name" : "A1",
>> "site.storm-site.logviewer.childopts" : "-Xmx128m",
>> "site.storm-site.logviewer.port" :
>> "${SUPERVISOR.ALLOCATED_PORT}{DO_NOT_PROPAGATE}",
>> "site.storm-site.nimbus.childopts" : "-Xmx1024m",
>> "site.storm-site.nimbus.cleanup.inbox.freq.secs" : "600",
>> "site.storm-site.nimbus.file.copy.expiration.secs" : "600",
>> "site.storm-site.nimbus.host" : "${NIMBUS_HOST}",
>> "site.storm-site.nimbus.inbox.jar.expiration.secs" : "3600",
>> "site.storm-site.nimbus.monitor.freq.secs" : "10",
>> "site.storm-site.nimbus.reassign" : "true",
>> "site.storm-site.nimbus.supervisor.timeout.secs" : "60",
>> "site.storm-site.nimbus.task.launch.secs" : "120",
>> "site.storm-site.nimbus.task.timeout.secs" : "5",
>> "site.storm-site.nimbus.thrift.max_buffer_size" : "1048576",
>> "site.storm-site.nimbus.thrift.port" : "${NIMBUS.ALLOCATED_PORT}",
>> "site.storm-site.nimbus.topology.validator" :
>> "backtype.storm.nimbus.DefaultTopologyValidator",
>> "site.storm-site.storm.cluster.mode" : "distributed",
>> "site.storm-site.storm.local.dir" :
>>"${AGENT_WORK_ROOT}/app/tmp/storm",
>> "site.storm-site.storm.local.mode.zmq" : "false",
>> "site.storm-site.storm.messaging.netty.buffer_size" : "5242880",
>> "site.storm-site.storm.messaging.netty.client_worker_threads" : "1",
>> "site.storm-site.storm.messaging.netty.max_retries" : "300",
>> "site.storm-site.storm.messaging.netty.max_wait_ms" : "1000",
>> "site.storm-site.storm.messaging.netty.min_wait_ms" : "100",
>> "site.storm-site.storm.messaging.netty.server_worker_threads" : "1",
>> "site.storm-site.storm.messaging.transport" :
>> "backtype.storm.messaging.netty.Context",
>> "site.storm-site.storm.thrift.transpor

Re: Doubts on stop/destroy the application instance

2015-04-27 Thread Gour Saha

Calling ³slider stop² before ³slider destroy² is the right order.

On calling stop, your storm cluster should be completely stopped
(including Slider AM and all storm components).

Can you run this command after stop and send the output (don¹t run destroy
yet)?

slider list  --containers

Also, at this point you should check the RM UI and it should show that the
yarn app is in stopped state.

-Gour

On 4/27/15, 11:52 AM, "Chackravarthy Esakkimuthu" 
wrote:

>I started the storm on yarn (slider create)
>Then wanted to test whether destroying the storm works or not.
>So I tried in the following order :
>
>1) slider stop 
>-- in this case, sliderAM alone stopped, and all the other storm daemons
>like Nimbus, supervisor, log_viewer,  drpc, UI_Server was running. (along
>with slider agents)
>
>Is this just an intermediate state before issuing destroy command?
>
>2) slider destroy 
>-- in this case, only nimbus and supervisor got killed. The other storm
>daemons (log_viewer,  drpc, UI_Server) still running. And slider agents
>too
>still running in all the 4 containers.
>
>This issue I face in 0.60 release. Then I tried with 0.71 release. But
>still same behaviour exists.
>
>Am I using the command in wrong way (or some other order) ? or issue
>exists.
>
>Thanks in advance!
>
>
>Thanks,
>Chackra

Re: Doubts on stop/destroy the application instance

2015-04-27 Thread Gour Saha

Sorry, forgot that --containers is supported in develop branch only. Just
run list without that option.

Seems like the running processes are stray processes from old experimental
runs. Can you check the date/time of these processes?

If you bring the storm instance up again, do you see new instances of
nimbus, supervisor, etc. getting created? The old stray ones will probably
still be there.

Also, can you run just “slider list” (no other params) and send the output?

-Gour

On 4/27/15, 12:20 PM, "Chackravarthy Esakkimuthu" 
wrote:

>There is some issue in that command usage (i tried giving the params in
>the
>the order also)
>
>sudo -u yarn /usr/hdp/current/slider-client/bin/./slider list storm1
>--containers
>
>2015-04-28 00:42:01,017 [main] ERROR main.ServiceLauncher -
>com.beust.jcommander.ParameterException: Unknown option: --containers in
>list storm1 --containers
>
>2015-04-28 00:42:01,021 [main] INFO  util.ExitUtil - Exiting with status
>40
>
>Anyway, I issued STOP command and checked in the RM UI, the application is
>stopped and all the 5 containers are released.. It shows as ZERO
>containers
>is running.
>
>But, when I login to that machine, I could see storm components are still
>running there (ps -ef | grep storm). The processes are up. Even Storm UI
>is
>still accessible.
>
>
>
>On Tue, Apr 28, 2015 at 12:29 AM, Gour Saha  wrote:
>
>> Calling ³slider stop² before ³slider destroy² is the right order.
>>
>> On calling stop, your storm cluster should be completely stopped
>> (including Slider AM and all storm components).
>>
>> Can you run this command after stop and send the output (don¹t run
>>destroy
>> yet)?
>>
>> slider list  --containers
>>
>> Also, at this point you should check the RM UI and it should show that
>>the
>> yarn app is in stopped state.
>>
>> -Gour
>>
>> On 4/27/15, 11:52 AM, "Chackravarthy Esakkimuthu"
>>
>> wrote:
>>
>> >I started the storm on yarn (slider create)
>> >Then wanted to test whether destroying the storm works or not.
>> >So I tried in the following order :
>> >
>> >1) slider stop 
>> >-- in this case, sliderAM alone stopped, and all the other storm
>>daemons
>> >like Nimbus, supervisor, log_viewer,  drpc, UI_Server was running.
>>(along
>> >with slider agents)
>> >
>> >Is this just an intermediate state before issuing destroy command?
>> >
>> >2) slider destroy 
>> >-- in this case, only nimbus and supervisor got killed. The other storm
>> >daemons (log_viewer,  drpc, UI_Server) still running. And slider agents
>> >too
>> >still running in all the 4 containers.
>> >
>> >This issue I face in 0.60 release. Then I tried with 0.71 release. But
>> >still same behaviour exists.
>> >
>> >Am I using the command in wrong way (or some other order) ? or issue
>> >exists.
>> >
>> >Thanks in advance!
>> >
>> >
>> >Thanks,
>> >Chackra
>>
>>

Re: Doubts on stop/destroy the application instance

2015-04-27 Thread Gour Saha

Hmm.. Interesting.

Is it possible to run "ps -ef | grep storm" before and after the storm1
app is started and send the output?

-Gour

On 4/27/15, 12:48 PM, "Chackravarthy Esakkimuthu" 
wrote:

>No, the processes are not old one, because it shows the class path which
>has folder names corresponds to newly launched application id. (also every
>time before launching new application, I made sure that all processes are
>killed)
>
>And the output of list command as follows :
>
>sudo -u yarn /usr/hdp/current/slider-client/bin/./slider list
>2015-04-28 01:14:24,568 [main] INFO  impl.TimelineClientImpl - Timeline
>service address: http://host2:8188/ws/v1/timeline/
>2015-04-28 01:14:25,669 [main] INFO  client.RMProxy - Connecting to
>ResourceManager at host2/XX.XX.XX.XX:8050
>storm1FINISHED  application_1428575950531_0013
>
>2015-04-28 01:14:26,108 [main] INFO  util.ExitUtil - Exiting with status 0
>
>On Tue, Apr 28, 2015 at 1:01 AM, Gour Saha  wrote:
>
>> Sorry, forgot that --containers is supported in develop branch only.
>>Just
>> run list without that option.
>>
>> Seems like the running processes are stray processes from old
>>experimental
>> runs. Can you check the date/time of these processes?
>>
>> If you bring the storm instance up again, do you see new instances of
>> nimbus, supervisor, etc. getting created? The old stray ones will
>>probably
>> still be there.
>>
>> Also, can you run just “slider list” (no other params) and send the
>>output?
>>
>> -Gour
>>
>> On 4/27/15, 12:20 PM, "Chackravarthy Esakkimuthu"
>>
>> wrote:
>>
>> >There is some issue in that command usage (i tried giving the params in
>> >the
>> >the order also)
>> >
>> >sudo -u yarn /usr/hdp/current/slider-client/bin/./slider list storm1
>> >--containers
>> >
>> >2015-04-28 00:42:01,017 [main] ERROR main.ServiceLauncher -
>> >com.beust.jcommander.ParameterException: Unknown option: --containers
>>in
>> >list storm1 --containers
>> >
>> >2015-04-28 00:42:01,021 [main] INFO  util.ExitUtil - Exiting with
>>status
>> >40
>> >
>> >Anyway, I issued STOP command and checked in the RM UI, the
>>application is
>> >stopped and all the 5 containers are released.. It shows as ZERO
>> >containers
>> >is running.
>> >
>> >But, when I login to that machine, I could see storm components are
>>still
>> >running there (ps -ef | grep storm). The processes are up. Even Storm
>>UI
>> >is
>> >still accessible.
>> >
>> >
>> >
>> >On Tue, Apr 28, 2015 at 12:29 AM, Gour Saha 
>> wrote:
>> >
>> >> Calling ³slider stop² before ³slider destroy² is the right order.
>> >>
>> >> On calling stop, your storm cluster should be completely stopped
>> >> (including Slider AM and all storm components).
>> >>
>> >> Can you run this command after stop and send the output (don¹t run
>> >>destroy
>> >> yet)?
>> >>
>> >> slider list  --containers
>> >>
>> >> Also, at this point you should check the RM UI and it should show
>>that
>> >>the
>> >> yarn app is in stopped state.
>> >>
>> >> -Gour
>> >>
>> >> On 4/27/15, 11:52 AM, "Chackravarthy Esakkimuthu"
>> >>
>> >> wrote:
>> >>
>> >> >I started the storm on yarn (slider create)
>> >> >Then wanted to test whether destroying the storm works or not.
>> >> >So I tried in the following order :
>> >> >
>> >> >1) slider stop 
>> >> >-- in this case, sliderAM alone stopped, and all the other storm
>> >>daemons
>> >> >like Nimbus, supervisor, log_viewer,  drpc, UI_Server was running.
>> >>(along
>> >> >with slider agents)
>> >> >
>> >> >Is this just an intermediate state before issuing destroy command?
>> >> >
>> >> >2) slider destroy 
>> >> >-- in this case, only nimbus and supervisor got killed. The other
>>storm
>> >> >daemons (log_viewer,  drpc, UI_Server) still running. And slider
>>agents
>> >> >too
>> >> >still running in all the 4 containers.
>> >> >
>> >> >This issue I face in 0.60 release. Then I tried with 0.71 release.
>>But
>> >> >still same behaviour exists.
>> >> >
>> >> >Am I using the command in wrong way (or some other order) ? or issue
>> >> >exists.
>> >> >
>> >> >Thanks in advance!
>> >> >
>> >> >
>> >> >Thanks,
>> >> >Chackra
>> >>
>> >>
>>
>>

Re: Doubts on stop/destroy the application instance

2015-04-27 Thread Gour Saha

Yes, those processes correspond to slider agent.

Based on the issue you are facing let’s do this -

Run “slider start storm1” again, it should create
application_1428575950531_0014 (with id 0014). After that can you check if
the processes from application_1428575950531_0013 are still running? If
yes, then run “slider stop storm1” again and then do you see processes
from both application_1428575950531_0013 and
application_1428575950531_0014 running?

-Gour

On 4/27/15, 1:11 PM, "Chackravarthy Esakkimuthu" 
wrote:

>And how do we confirm that slider agents are stopped in each node where
>the
>container is allocated?
>because even after stop command and even destroy command, I could see
>agents seems to be running in all those nodes.
>
>yarn 47909 47907  0 00:37 ?00:00:00 /bin/bash -c python
>./infra/agent/slider-agent/agent/main.py --label
>container_1428575950531_0013_01_02___NIMBUS --zk-quorum
>host1:2181,host2:2181,host3:2181 --zk-reg-path
>/registry/users/yarn/services/org-apache-slider/storm1 >
>/var/log/hadoop-yarn/application_1428575950531_0013/container_142857595053
>1_0013_01_02/slider-agent.out
>2>&1
>yarn 47915 47909  0 00:37 ?00:00:02 python
>./infra/agent/slider-agent/agent/main.py --label
>container_1428575950531_0013_01_02___NIMBUS --zk-quorum
>host1:2181,host2:2181,host3:2181 --zk-reg-path
>/registry/users/yarn/services/org-apache-slider/storm1
>
>Doesn't these processes correspond to slider agent?
>
>On Tue, Apr 28, 2015 at 1:32 AM, Chackravarthy Esakkimuthu <
>chaku.mi...@gmail.com> wrote:
>
>> 1) slider create storm1
>> --- it started all the components, SliderAM, slider agents. And storm UI
>> was accessible. Also manually logged into each host and verified all
>> components are up and running.
>>
>> 2) slider stop storm1
>> --- it stopped SliderAM
>> --- but all the components are running along with slider agents. And
>>storm
>> UI was accessible.
>>
>> 3) slider start storm1 (RM UI was less responsive during this time)
>> --- it started another sliderAM and other set of storm components and
>> slider agents also. And able to access storm UI in another host.
>>
>> So now, actually two storm cluster is running though I used same name
>> "storm1"
>>
>> On Tue, Apr 28, 2015 at 1:23 AM, Gour Saha 
>>wrote:
>>
>>> Hmm.. Interesting.
>>>
>>> Is it possible to run "ps -ef | grep storm" before and after the storm1
>>> app is started and send the output?
>>>
>>> -Gour
>>>
>>> On 4/27/15, 12:48 PM, "Chackravarthy Esakkimuthu"
>>>
>>> wrote:
>>>
>>> >No, the processes are not old one, because it shows the class path
>>>which
>>> >has folder names corresponds to newly launched application id. (also
>>> every
>>> >time before launching new application, I made sure that all processes
>>>are
>>> >killed)
>>> >
>>> >And the output of list command as follows :
>>> >
>>> >sudo -u yarn /usr/hdp/current/slider-client/bin/./slider list
>>> >2015-04-28 01:14:24,568 [main] INFO  impl.TimelineClientImpl -
>>>Timeline
>>> >service address: http://host2:8188/ws/v1/timeline/
>>> >2015-04-28 01:14:25,669 [main] INFO  client.RMProxy - Connecting to
>>> >ResourceManager at host2/XX.XX.XX.XX:8050
>>> >storm1FINISHED
>>> application_1428575950531_0013
>>> >
>>> >2015-04-28 01:14:26,108 [main] INFO  util.ExitUtil - Exiting with
>>>status
>>> 0
>>> >
>>> >On Tue, Apr 28, 2015 at 1:01 AM, Gour Saha 
>>> wrote:
>>> >
>>> >> Sorry, forgot that --containers is supported in develop branch only.
>>> >>Just
>>> >> run list without that option.
>>> >>
>>> >> Seems like the running processes are stray processes from old
>>> >>experimental
>>> >> runs. Can you check the date/time of these processes?
>>> >>
>>> >> If you bring the storm instance up again, do you see new instances
>>>of
>>> >> nimbus, supervisor, etc. getting created? The old stray ones will
>>> >>probably
>>> >> still be there.
>>> >>
>>> >> Also, can you run just “slider list” (no other params) and send the
>>> >>output?
>>> >>
>>> >> -Gour
>>> >>
>>> >> On 4/27/15, 12:20

Re: Doubts on stop/destroy the application instance

2015-04-27 Thread Gour Saha

To dig deeper, I would need to get hold of the Slider AM log (slider.log)
and at least one of the agent logs (slider-agent.log) for Nimbus say.

They will be under -
/hadoop/yarn/log///

OR you can run -
yarn logs -applicationId 
and dump it in a file, if the  directory under /hadoop/yarn/log is
missing.


Also if you could provide the Node Manager logs it would help. It is under
- 
/var/log/hadoop-yarn/yarn/

and file name of the format - yarn-yarn-nodemanager-.log

-Gour

On 4/27/15, 1:32 PM, "Chackravarthy Esakkimuthu" 
wrote:

>Run “slider start storm1” again, it should create
>application_1428575950531_0014
>(with id 0014).
>   ---> yes it does
>
>After that can you check if the processes from
>application_1428575950531_0013 are still running?
>   ---> yes
>
>If yes, then run “slider stop storm1” again and then do you see processes
>from
>both application_1428575950531_0013 and application_1428575950531_0014
>running?
>   ---> yes both are running and able to access both storm UI's also.
>(only
>SliderAM was stopped)
>
>On Tue, Apr 28, 2015 at 1:54 AM, Gour Saha  wrote:
>
>> Yes, those processes correspond to slider agent.
>>
>> Based on the issue you are facing let’s do this -
>>
>> Run “slider start storm1” again, it should create
>> application_1428575950531_0014 (with id 0014). After that can you check
>>if
>> the processes from application_1428575950531_0013 are still running? If
>> yes, then run “slider stop storm1” again and then do you see processes
>> from both application_1428575950531_0013 and
>> application_1428575950531_0014 running?
>>
>> -Gour
>>
>> On 4/27/15, 1:11 PM, "Chackravarthy Esakkimuthu" 
>> wrote:
>>
>> >And how do we confirm that slider agents are stopped in each node where
>> >the
>> >container is allocated?
>> >because even after stop command and even destroy command, I could see
>> >agents seems to be running in all those nodes.
>> >
>> >yarn 47909 47907  0 00:37 ?00:00:00 /bin/bash -c python
>> >./infra/agent/slider-agent/agent/main.py --label
>> >container_1428575950531_0013_01_02___NIMBUS --zk-quorum
>> >host1:2181,host2:2181,host3:2181 --zk-reg-path
>> >/registry/users/yarn/services/org-apache-slider/storm1 >
>> 
>>>/var/log/hadoop-yarn/application_1428575950531_0013/container_1428575950
>>>53
>> >1_0013_01_02/slider-agent.out
>> >2>&1
>> >yarn 47915 47909  0 00:37 ?00:00:02 python
>> >./infra/agent/slider-agent/agent/main.py --label
>> >container_1428575950531_0013_01_02___NIMBUS --zk-quorum
>> >host1:2181,host2:2181,host3:2181 --zk-reg-path
>> >/registry/users/yarn/services/org-apache-slider/storm1
>> >
>> >Doesn't these processes correspond to slider agent?
>> >
>> >On Tue, Apr 28, 2015 at 1:32 AM, Chackravarthy Esakkimuthu <
>> >chaku.mi...@gmail.com> wrote:
>> >
>> >> 1) slider create storm1
>> >> --- it started all the components, SliderAM, slider agents. And
>>storm UI
>> >> was accessible. Also manually logged into each host and verified all
>> >> components are up and running.
>> >>
>> >> 2) slider stop storm1
>> >> --- it stopped SliderAM
>> >> --- but all the components are running along with slider agents. And
>> >>storm
>> >> UI was accessible.
>> >>
>> >> 3) slider start storm1 (RM UI was less responsive during this time)
>> >> --- it started another sliderAM and other set of storm components and
>> >> slider agents also. And able to access storm UI in another host.
>> >>
>> >> So now, actually two storm cluster is running though I used same name
>> >> "storm1"
>> >>
>> >> On Tue, Apr 28, 2015 at 1:23 AM, Gour Saha 
>> >>wrote:
>> >>
>> >>> Hmm.. Interesting.
>> >>>
>> >>> Is it possible to run "ps -ef | grep storm" before and after the
>>storm1
>> >>> app is started and send the output?
>> >>>
>> >>> -Gour
>> >>>
>> >>> On 4/27/15, 12:48 PM, "Chackravarthy Esakkimuthu"
>> >>>
>> >>> wrote:
>> >>>
>> >>> >No, the processes are not old one, because it shows the class path
>> >>>which
>> >>> >has folder names corresponds to newly launched application id.
>>(also
>

Re: Doubts on stop/destroy the application instance

2015-04-28 Thread Gour Saha

Can you send us the complete-config dump?

-Gour

On 4/28/15, 2:45 AM, "Chackravarthy Esakkimuthu" 
wrote:

>yes this is the config taken by slider also.
>
>http://host2:8088/proxy/application_1428575950531_0016/ws/v1/slider/publis
>her/slider/complete-config
>
>yarn.nodemanager.sleep-delay-before-sigkill.ms: "250"
>
>its default value coming from yarn-default.
>We have not configured it in yarn-site.
>
>On Tue, Apr 28, 2015 at 3:03 PM, Chackravarthy Esakkimuthu <
>chaku.mi...@gmail.com> wrote:
>
>> Following is the config which I get from RM UI,
>>
>> http://host2:8088/conf
>>
>> 
>> yarn.nodemanager.sleep-delay-before-sigkill.ms
>> 250
>> yarn-default.xml
>> 
>>
>> On Tue, Apr 28, 2015 at 2:50 PM, Steve Loughran 
>> wrote:
>>
>>>
>>> > On 28 Apr 2015, at 10:07, Chackravarthy Esakkimuthu <
>>> chaku.mi...@gmail.com> wrote:
>>> >
>>> > sure, will send you the logs.
>>> >
>>> > And the same pattern follows for hbase installation also.
>>> > 'stop' command stops only SliderAM.
>>> > 'destroy' command stops HMaster and RegionServer only.. HBASE_REST
>>>and
>>> > THRIFT_2 still running after destroy command, And slider agents
>>>running
>>> in
>>> > all 4 hosts where container was launched.
>>> >
>>>
>>>
>>>
>>> do you have YARN set up to actually kill processes when the containers
>>> are released.?
>>>
>>> For example:
>>>
>>> 
>>> 
>>>   yarn.nodemanager.sleep-delay-before-sigkill.ms
>>>   3
>>> 
>>>
>>
>>

Re: Doubts on stop/destroy the application instance

2015-04-29 Thread Gour Saha

Thanks Chackra for providing the Slider and NM logs and configs of the
cluster. From the logs it seems like a YARN bug, so I went ahead and filed
one. I will follow up with the YARN team to see what is causing this -

https://issues.apache.org/jira/browse/YARN-3561


-Gour

On 4/28/15, 7:48 AM, "Gour Saha"  wrote:

>Can you send us the complete-config dump?
>
>-Gour
>
>On 4/28/15, 2:45 AM, "Chackravarthy Esakkimuthu" 
>wrote:
>
>>yes this is the config taken by slider also.
>>
>>http://host2:8088/proxy/application_1428575950531_0016/ws/v1/slider/publi
>>s
>>her/slider/complete-config
>>
>>yarn.nodemanager.sleep-delay-before-sigkill.ms: "250"
>>
>>its default value coming from yarn-default.
>>We have not configured it in yarn-site.
>>
>>On Tue, Apr 28, 2015 at 3:03 PM, Chackravarthy Esakkimuthu <
>>chaku.mi...@gmail.com> wrote:
>>
>>> Following is the config which I get from RM UI,
>>>
>>> http://host2:8088/conf
>>>
>>> 
>>> yarn.nodemanager.sleep-delay-before-sigkill.ms
>>> 250
>>> yarn-default.xml
>>> 
>>>
>>> On Tue, Apr 28, 2015 at 2:50 PM, Steve Loughran
>>>
>>> wrote:
>>>
>>>>
>>>> > On 28 Apr 2015, at 10:07, Chackravarthy Esakkimuthu <
>>>> chaku.mi...@gmail.com> wrote:
>>>> >
>>>> > sure, will send you the logs.
>>>> >
>>>> > And the same pattern follows for hbase installation also.
>>>> > 'stop' command stops only SliderAM.
>>>> > 'destroy' command stops HMaster and RegionServer only.. HBASE_REST
>>>>and
>>>> > THRIFT_2 still running after destroy command, And slider agents
>>>>running
>>>> in
>>>> > all 4 hosts where container was launched.
>>>> >
>>>>
>>>>
>>>>
>>>> do you have YARN set up to actually kill processes when the containers
>>>> are released.?
>>>>
>>>> For example:
>>>>
>>>> 
>>>> 
>>>>   yarn.nodemanager.sleep-delay-before-sigkill.ms
>>>>   3
>>>> 
>>>>
>>>
>>>
>

Re: Doubts on stop/destroy the application instance

2015-04-29 Thread Gour Saha

Unfortunately we haven¹t reproduced this issue in the envs we usually test
on. We might have to create an exact replica of your cluster (with RM HA,
NN HA, OS version, # of nodes, etc.) to be able to reproduce it. The YARN
team is looking into this issue.

By the way, what is the OS and version of the nodes in your cluster?

-Gour

On 4/29/15, 10:49 AM, "Chackravarthy Esakkimuthu" 
wrote:

>sure Gour, Thanks for helping out.
>Do you also see these kind of issues? Is it reproducible for you as well?
>
>On Wed, Apr 29, 2015 at 8:58 PM, Gour Saha  wrote:
>
>> Thanks Chackra for providing the Slider and NM logs and configs of the
>> cluster. From the logs it seems like a YARN bug, so I went ahead and
>>filed
>> one. I will follow up with the YARN team to see what is causing this -
>>
>> https://issues.apache.org/jira/browse/YARN-3561
>>
>>
>> -Gour
>>
>> On 4/28/15, 7:48 AM, "Gour Saha"  wrote:
>>
>> >Can you send us the complete-config dump?
>> >
>> >-Gour
>> >
>> >On 4/28/15, 2:45 AM, "Chackravarthy Esakkimuthu"
>>
>> >wrote:
>> >
>> >>yes this is the config taken by slider also.
>> >>
>> >>
>> 
>>http://host2:8088/proxy/application_1428575950531_0016/ws/v1/slider/publi
>> >>s
>> >>her/slider/complete-config
>> >>
>> >>yarn.nodemanager.sleep-delay-before-sigkill.ms: "250"
>> >>
>> >>its default value coming from yarn-default.
>> >>We have not configured it in yarn-site.
>> >>
>> >>On Tue, Apr 28, 2015 at 3:03 PM, Chackravarthy Esakkimuthu <
>> >>chaku.mi...@gmail.com> wrote:
>> >>
>> >>> Following is the config which I get from RM UI,
>> >>>
>> >>> http://host2:8088/conf
>> >>>
>> >>> 
>> >>> yarn.nodemanager.sleep-delay-before-sigkill.ms
>> >>> 250
>> >>> yarn-default.xml
>> >>> 
>> >>>
>> >>> On Tue, Apr 28, 2015 at 2:50 PM, Steve Loughran
>> >>>
>> >>> wrote:
>> >>>
>> >>>>
>> >>>> > On 28 Apr 2015, at 10:07, Chackravarthy Esakkimuthu <
>> >>>> chaku.mi...@gmail.com> wrote:
>> >>>> >
>> >>>> > sure, will send you the logs.
>> >>>> >
>> >>>> > And the same pattern follows for hbase installation also.
>> >>>> > 'stop' command stops only SliderAM.
>> >>>> > 'destroy' command stops HMaster and RegionServer only..
>>HBASE_REST
>> >>>>and
>> >>>> > THRIFT_2 still running after destroy command, And slider agents
>> >>>>running
>> >>>> in
>> >>>> > all 4 hosts where container was launched.
>> >>>> >
>> >>>>
>> >>>>
>> >>>>
>> >>>> do you have YARN set up to actually kill processes when the
>>containers
>> >>>> are released.?
>> >>>>
>> >>>> For example:
>> >>>>
>> >>>> 
>> >>>> 
>> >>>>   yarn.nodemanager.sleep-delay-before-sigkill.ms
>> >>>>   3
>> >>>> 
>> >>>>
>> >>>
>> >>>
>> >
>>
>>

Re: Doubts on stop/destroy the application instance

2015-04-29 Thread Gour Saha

Thanks. I updated the YARN bug with the OS info.

I saw that RM HA is disabled. By the way there is a patch submitted by
YARN for the RM HA issue -
https://issues.apache.org/jira/browse/SLIDER-846


As part of the YARN bug https://issues.apache.org/jira/browse/YARN-2605.

If you want I can provide you a patch to test, if you are okay to get a
jar from us.

-Gour

On 4/29/15, 11:18 AM, "Chackravarthy Esakkimuthu" 
wrote:

>OS installed is debian 7.
>And as I was facing issue (components were not starting) with RM HA
>enabled, I am testing it with RM HA disabled only. And yes, still NN HA is
>still enabled in the cluster.
>
>On Wed, Apr 29, 2015 at 11:37 PM, Gour Saha  wrote:
>
>> Unfortunately we haven¹t reproduced this issue in the envs we usually
>>test
>> on. We might have to create an exact replica of your cluster (with RM
>>HA,
>> NN HA, OS version, # of nodes, etc.) to be able to reproduce it. The
>>YARN
>> team is looking into this issue.
>>
>> By the way, what is the OS and version of the nodes in your cluster?
>>
>> -Gour
>>
>> On 4/29/15, 10:49 AM, "Chackravarthy Esakkimuthu"
>>
>> wrote:
>>
>> >sure Gour, Thanks for helping out.
>> >Do you also see these kind of issues? Is it reproducible for you as
>>well?
>> >
>> >On Wed, Apr 29, 2015 at 8:58 PM, Gour Saha 
>>wrote:
>> >
>> >> Thanks Chackra for providing the Slider and NM logs and configs of
>>the
>> >> cluster. From the logs it seems like a YARN bug, so I went ahead and
>> >>filed
>> >> one. I will follow up with the YARN team to see what is causing this
>>-
>> >>
>> >> https://issues.apache.org/jira/browse/YARN-3561
>> >>
>> >>
>> >> -Gour
>> >>
>> >> On 4/28/15, 7:48 AM, "Gour Saha"  wrote:
>> >>
>> >> >Can you send us the complete-config dump?
>> >> >
>> >> >-Gour
>> >> >
>> >> >On 4/28/15, 2:45 AM, "Chackravarthy Esakkimuthu"
>> >>
>> >> >wrote:
>> >> >
>> >> >>yes this is the config taken by slider also.
>> >> >>
>> >> >>
>> >>
>> >>
>> 
>>http://host2:8088/proxy/application_1428575950531_0016/ws/v1/slider/publi
>> >> >>s
>> >> >>her/slider/complete-config
>> >> >>
>> >> >>yarn.nodemanager.sleep-delay-before-sigkill.ms: "250"
>> >> >>
>> >> >>its default value coming from yarn-default.
>> >> >>We have not configured it in yarn-site.
>> >> >>
>> >> >>On Tue, Apr 28, 2015 at 3:03 PM, Chackravarthy Esakkimuthu <
>> >> >>chaku.mi...@gmail.com> wrote:
>> >> >>
>> >> >>> Following is the config which I get from RM UI,
>> >> >>>
>> >> >>> http://host2:8088/conf
>> >> >>>
>> >> >>> 
>> >> >>> yarn.nodemanager.sleep-delay-before-sigkill.ms
>> >> >>> 250
>> >> >>> yarn-default.xml
>> >> >>> 
>> >> >>>
>> >> >>> On Tue, Apr 28, 2015 at 2:50 PM, Steve Loughran
>> >> >>>
>> >> >>> wrote:
>> >> >>>
>> >> >>>>
>> >> >>>> > On 28 Apr 2015, at 10:07, Chackravarthy Esakkimuthu <
>> >> >>>> chaku.mi...@gmail.com> wrote:
>> >> >>>> >
>> >> >>>> > sure, will send you the logs.
>> >> >>>> >
>> >> >>>> > And the same pattern follows for hbase installation also.
>> >> >>>> > 'stop' command stops only SliderAM.
>> >> >>>> > 'destroy' command stops HMaster and RegionServer only..
>> >>HBASE_REST
>> >> >>>>and
>> >> >>>> > THRIFT_2 still running after destroy command, And slider
>>agents
>> >> >>>>running
>> >> >>>> in
>> >> >>>> > all 4 hosts where container was launched.
>> >> >>>> >
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> do you have YARN set up to actually kill processes when the
>> >>containers
>> >> >>>> are released.?
>> >> >>>>
>> >> >>>> For example:
>> >> >>>>
>> >> >>>> 
>> >> >>>> 
>> >> >>>>   yarn.nodemanager.sleep-delay-before-sigkill.ms
>> >> >>>>   3
>> >> >>>> 
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >
>> >>
>> >>
>>
>>

Planning for Apache Slider 0.80.0 release

2015-04-30 Thread Gour Saha

Folks,

I am planning to release Slider 0.80.0 by May 15, 2015.

I am setting the code freeze date to be Monday, May 4. On Tuesday, May 5 I will 
cut the release branch.

After the release branch is cut, only blocker bug fixes and test-only code will 
be accepted until Wednesday, May 6.

-Gour

Re: Packaging new apps

2015-05-07 Thread Gour Saha

Hi Jean,

Please see answers inline.

-Gour

On 5/6/15, 6:16 AM, "Jean-Baptiste Note"
mailto:jbn...@gmail.com>> wrote:

Hi folks,

Currently we're using Chef in our organization to deploy a lot of
infrastructure services around Hadoop. Of course it makes a lot of sense to
offer these as self-services on YARN using slider, but i'm looking at a
number of challenges. So please forgive the broad range of questions :)

I'm specifically intersted in deploying the following applications:
* HTTPFS service (see https://github.com/jbnote/httpfs-slider) & helpers
(nginx)
* Opentsdb & helpers (varnish)
* kafka (I had a look at koya)
* druid
* storm (fine, thanks !)
* hbase (fine, thanks !)

I'm facing a lot of issues with those services which are not yet packaged
correctly:

* httpfs/opentsdb are not released as standalone tarballs, contrary to all
services currently packaged. So i've butchered a tarball from Cloudera
RPMs, which is not satisfactory. How would you go about handling this ?

Not sure exactly what you mean, by saying "handling this". If you are referring
to a way to create a Slider package of an app in rpm format, then there are
challenges, such as rpm install requires root access and YARN does not allow
that. If you are referring to an issue you are facing with deploying the Slider
app (now that you have created a tarball), can you share what issues you are
facing?

You might also want to take a look at this tomcat Slider package. Caution: It
is not ready for prime-time and has few issues which needs to be resolved. But
the scripts and metadata files might be a helpful reference.
https://issues.apache.org/jira/browse/SLIDER-809
https://github.com/apache/incubator-slider/tree/feature/SLIDER-809-tomcat-app-package/app-packages/tomcat

* KOYA has been talked a lot of, however the source i'm looking at (
https://github.com/DataTorrent/koya) is kind of disappointing, and activity
is a bit low -- would anyone know if dataTorrent is still committed to the
project ?

What issues are you facing with KOYA? DataTorrent gave a presentation of KOYA
and Slider seems to have fit their need so far. They wanted few features around
data locality (strict placement) which will be there in 0.80.0 release AND
unique ids which still needs some work to be done.

Last but not least, I'm wondering if there would already be a plan to
expose somehow (through an internal or an external service) the registry
through DNS (that's what we really use for service location for HTTPFS &
OpenTSDB). A bash polling script would certainly be sufficient for our
needs for now, but longer-term, we'd need to have a more robust solution.

Registry and REST APIs on registry comes directly from YARN -
https://issues.apache.org/jira/browse/YARN-913
https://issues.apache.org/jira/browse/YARN-2948
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/registry/yarn-registry.html

Thanks a lot, kind regards,
JB

[VOTE] Apache Slider Incubating Release 0.80.0-incubating

2015-05-08 Thread Gour Saha

Hello,

This is a call for a vote on Apache Slider 0.80.0-incubating release.

This is a source+binary release.

The list of all issues fixed:
http://s.apache.org/539

Staged artifacts:
https://repository.apache.org/content/repositories/orgapacheslider-1006/org/apache/slider

Git source:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=d7e3449fa649dbb88388329724334f2d1aac2869
SHA1: d7e3449fa649dbb88388329724334f2d1aac2869
Tag: slider-0.80.0-incubating

PGP key:
http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org

Build build/test instructions at:
http://slider.incubator.apache.org/developing/building.html

Vote will be open for 72 hours

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

To start, here's my vote: +1

-Gour

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

[RESULT] [VOTE] Apache Slider Incubating Release 0.80.0-incubating

2015-05-12 Thread Gour Saha

The vote passes with 4 +1 votes

Steve Loughran +1 (binding)
Ted Yu +1
Josh Elser +1
Gour Saha  +1

Mail Thread:
http://s.apache.org/MNx

-Gour


On 5/8/15, 8:14 AM, "Gour Saha" 
mailto:gs...@hortonworks.com>> wrote:

Hello,

This is a call for a vote on Apache Slider 0.80.0-incubating release.

This is a source+binary release.

The list of all issues fixed:
http://s.apache.org/539

Staged artifacts:
https://repository.apache.org/content/repositories/orgapacheslider-1006/org/apache/slider

Git source:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=d7e3449fa649dbb88388329724334f2d1aac2869
SHA1: d7e3449fa649dbb88388329724334f2d1aac2869
Tag: slider-0.80.0-incubating

PGP key:
http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org

Build build/test instructions at:
http://slider.incubator.apache.org/developing/building.html

Vote will be open for 72 hours

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

To start, here's my vote: +1

-Gour

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

Re: [RESULT] [VOTE] Apache Slider Incubating Release 0.80.0-incubating

2015-05-12 Thread Gour Saha

Rajesh,

Really appreciate your time and vote. Typically we wait for 72 hours and
if at least 3 votes are in by then, we publish the results.

Nevertheless, your vote is very valuable and look forward to your
participation in upcoming releases.

-Gour

On 5/12/15, 2:24 PM, "Rajesh Kartha"  wrote:

>Thanks!
>
>A bit late...but here is my vote:
>
>+1 (non-binding)
>
>-  ran 'mvn package'  (with unit tests)
>-  created an slider hbase (0.98.8 and 1.1)  package
>-  installed and tested the package with slider (start, stop, flex etc)
>
>No new issues encountered.
>
>-Rajesh
>
>
>On Tue, May 12, 2015 at 12:59 PM, Gour Saha  wrote:
>
>> The vote passes with 4 +1 votes
>>
>> Steve Loughran +1 (binding)
>> Ted Yu +1
>> Josh Elser +1
>> Gour Saha  +1
>>
>> Mail Thread:
>> http://s.apache.org/MNx
>>
>> -Gour
>>
>>
>> On 5/8/15, 8:14 AM, "Gour Saha" > gs...@hortonworks.com>> wrote:
>>
>> Hello,
>>
>> This is a call for a vote on Apache Slider 0.80.0-incubating release.
>>
>> This is a source+binary release.
>>
>> The list of all issues fixed:
>> http://s.apache.org/539
>>
>> Staged artifacts:
>>
>> 
>>https://repository.apache.org/content/repositories/orgapacheslider-1006/o
>>rg/apache/slider
>>
>> Git source:
>>
>> 
>>https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h
>>=d7e3449fa649dbb88388329724334f2d1aac2869
>> SHA1: d7e3449fa649dbb88388329724334f2d1aac2869
>> Tag: slider-0.80.0-incubating
>>
>> PGP key:
>> http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org
>>
>> Build build/test instructions at:
>> http://slider.incubator.apache.org/developing/building.html
>>
>> Vote will be open for 72 hours
>>
>> [ ] +1 approve
>> [ ] +0 no opinion
>> [ ] -1 disapprove (and reason why)
>>
>> To start, here's my vote: +1
>>
>> -Gour
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
>>entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the
>>reader
>> of this message is not the intended recipient, you are hereby notified
>>that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>>immediately
>> and delete it from your system. Thank You.
>>
>>

Re: Slider stop not working

2015-05-13 Thread Gour Saha

Hi Chackra,

You are absolutely right. The workaround that I was planning to work on, should 
be implemented as a neat backup solution, when YARN fails to shutdown 
containers (in this and certain other possible scenarios).

In fact, we had filed a bug long time back along the same lines, predicting 
this issue (for another scenario) - 
https://issues.apache.org/jira/browse/SLIDER-479

As you had expressed interest to contribute to Slider, I was thinking if you 
would have some cycles and be willing to take this up. You can work on the 
develop branch and use SLIDER-479. Slider develop branch is compatible with HDP 
2.2, so we can easily test the fix in your cluster.

Let me know, and I can help all along the way.

In case you have some cycles, here are some pointers that might help you to 
approach this problem -

1. Slider has a notion of sending a terminate command to the agent which the 
agent obeys and gracefully brings itself down
2. In this scenario since Slider AM goes down, the agents can look for a node 
in Zookeeper (when it looses connection with AM) and shut themselves down if 
the node is missing (using the terminate code path or something more elegant)
3. Of course this Zookeeper node needs to be created by Slider AM in the 
beginning of create cluster and then deleted just before the AM shuts down as 
part of the stop command (might have to look into YARN pre-emption scenario, 
but we can ignore this for now). We do not want to delete this in AM 
failure/restart scenario.
4. Any other better ideas or elegant solution you can think of

On a side note, can you test this in debian 7 -
Go to one of the nodes where any of the agents are running (say NIMBUS or any 
other component) and then issue a SIGTERM to the main.py process (kill -s TERM 
). What do you see in the slider-agent.log after that? What happens to all 
the processes in this container? Are they still running?

The  is that of the bash main.py process (not the python main.py child 
process).

So if the process is something like this -
yarn  6007  6003  0 19:43 ?00:00:00 /bin/bash -c python 
./infra/agent/slider-agent/agent/main.py --label 
container_1431413628146_0003_01_02___NIMBUS --zk-quorum 
c6408.ambari.apache.org:2181 --zk-reg-path 
/registry/users/yarn/services/org-apache-slider/storm_1 > 
/hadoop/yarn/log/application_1431413628146_0003/container_1431413628146_0003_01_02/slider-agent.out
 2>&1

You need to issue -
kill -s TERM 6007

-Gour

On 5/13/15, 1:38 AM, "Chackravarthy Esakkimuthu" 
mailto:chaku.mi...@gmail.com>> wrote:

Thanks for your response steve,

I was thinking that SliderAgent would receive 'stop' command from SliderAM
to kill the components spawned by those agents. And yeah this might be
specific to debian installation as others in the group are not facing this
issue.

On Tue, May 12, 2015 at 1:50 PM, Steve Loughran 
mailto:ste...@hortonworks.com>>
wrote:

> On 12 May 2015, at 08:42, Chackravarthy Esakkimuthu <
chaku.mi...@gmail.com> wrote:
>
> Starting a new thread,
>
> already JIRA filed for the same by Gour,
> https://issues.apache.org/jira/browse/YARN-3561
>
> Slider stop does not stop the components started by slider, instead it
> stops only SliderAM, and even SliderAgents did not receive 'stop'
command.
> (it happens with debian 7) and tested with 0.70.1 as well as 'develop'
> branch code.
>
> Today I just came across the following mail archive,
>
>
http://mail-archives.apache.org/mod_mbox/incubator-slider-dev/201503.mbox/%3c1426350060949.97...@hortonworks.com%3E
>
> <
>
> *What is not implemented is an explicit call to "stop function in the
> python scripts".
>
> What I was referring to that an attempt is made by the Agent to call
> stop in the python script
> but it is not guaranteed. The reason it is not guaranteed is that the
> call to stop() and kill
> of the containers by YARN is not co-ordinated.
>
> In summary, the ability to call stop() functions in the python script
> is not implemented.
> Its in the plan though.*
>
>>>
>
> Does this still exists?

the idea of stop|() command is to actually offer a best-effort clean
shutdown for containers. Currently the AM just directly tells YARN to
destroy a container. The agent doesn't get told, nor does the application
(that's implicit from the agent).

YARN is expected to "kill" then, if there is no response, "kill -9" the
agent process. Which it does for the hosts we test on, linux, OSX and
windows.

IF something is up with your YARN+debian installation, we believe that it
is related to whether those container kill events are coming out from the
node manager.

Re: Container recovery on working on CDH with yarn.component.placement.policy=1

2015-05-13 Thread Gour Saha

Can you check the resources (memory, cpu) available in the host, after
killing the container? Is it freed? Can you hit the RM UI and share what
you see in the ³Cluster Metrics² table for that node?

Also, if possible please share your resources.json.

-Gour

On 5/12/15, 9:34 AM, "Thomas Weise"  wrote:

>We are testing KOYA on CDH 5.4. We see that after killing the container
>Slider as expected will ask for the same host. The request is never filled
>and the container cannot be redeployed. We see this behavior on CDH with
>DataTorrent also, it looks like a CDH bug.
>
>Anyone else trying to run Slider on CDH and sees the same behavior? Any
>insight on whether that is a CDH configuration issue or fair scheduler
>bug?
>
>Thanks,
>Thomas

Re: [RESULT] [VOTE] Apache Slider Incubating Release 0.80.0-incubating

2015-05-14 Thread Gour Saha

The general incubator vote is going on now (not sure if you are subscribed
to that DL). It takes 3 days. The 3 days is scheduled to end by EOD today.
Once it is approved, we typically update the download link and send out an
ANNOUNCE email the day after which would be Friday (tomorrow).

-Gour

On 5/14/15, 10:59 AM, "Rajesh Kartha"  wrote:

>Thanks Gour.
>
>Was curious, once the voting is complete when does newer Slider release
>typically become available for download at:
>http://slider.incubator.apache.org/
>
>It show 0.70.1 currently.
>
>
>-Rajesh
>
>
>
>On Tue, May 12, 2015 at 2:30 PM, Gour Saha  wrote:
>
>> Rajesh,
>>
>> Really appreciate your time and vote. Typically we wait for 72 hours and
>> if at least 3 votes are in by then, we publish the results.
>>
>> Nevertheless, your vote is very valuable and look forward to your
>> participation in upcoming releases.
>>
>> -Gour
>>
>> On 5/12/15, 2:24 PM, "Rajesh Kartha"  wrote:
>>
>> >Thanks!
>> >
>> >A bit late...but here is my vote:
>> >
>> >+1 (non-binding)
>> >
>> >-  ran 'mvn package'  (with unit tests)
>> >-  created an slider hbase (0.98.8 and 1.1)  package
>> >-  installed and tested the package with slider (start, stop, flex etc)
>> >
>> >No new issues encountered.
>> >
>> >-Rajesh
>> >
>> >
>> >On Tue, May 12, 2015 at 12:59 PM, Gour Saha 
>> wrote:
>> >
>> >> The vote passes with 4 +1 votes
>> >>
>> >> Steve Loughran +1 (binding)
>> >> Ted Yu +1
>> >> Josh Elser +1
>> >> Gour Saha  +1
>> >>
>> >> Mail Thread:
>> >> http://s.apache.org/MNx
>> >>
>> >> -Gour
>> >>
>> >>
>> >> On 5/8/15, 8:14 AM, "Gour Saha" > >> gs...@hortonworks.com>> wrote:
>> >>
>> >> Hello,
>> >>
>> >> This is a call for a vote on Apache Slider 0.80.0-incubating release.
>> >>
>> >> This is a source+binary release.
>> >>
>> >> The list of all issues fixed:
>> >> http://s.apache.org/539
>> >>
>> >> Staged artifacts:
>> >>
>> >>
>> >>
>> 
>>https://repository.apache.org/content/repositories/orgapacheslider-1006/o
>> >>rg/apache/slider
>> >>
>> >> Git source:
>> >>
>> >>
>> >>
>> 
>>https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h
>> >>=d7e3449fa649dbb88388329724334f2d1aac2869
>> >> SHA1: d7e3449fa649dbb88388329724334f2d1aac2869
>> >> Tag: slider-0.80.0-incubating
>> >>
>> >> PGP key:
>> >> http://pgp.mit.edu/pks/lookup?op=vindex&search=gourks...@apache.org
>> >>
>> >> Build build/test instructions at:
>> >> http://slider.incubator.apache.org/developing/building.html
>> >>
>> >> Vote will be open for 72 hours
>> >>
>> >> [ ] +1 approve
>> >> [ ] +0 no opinion
>> >> [ ] -1 disapprove (and reason why)
>> >>
>> >> To start, here's my vote: +1
>> >>
>> >> -Gour
>> >>
>> >> --
>> >> CONFIDENTIALITY NOTICE
>> >> NOTICE: This message is intended for the use of the individual or
>> >>entity to
>> >> which it is addressed and may contain information that is
>>confidential,
>> >> privileged and exempt from disclosure under applicable law. If the
>> >>reader
>> >> of this message is not the intended recipient, you are hereby
>>notified
>> >>that
>> >> any printing, copying, dissemination, distribution, disclosure or
>> >> forwarding of this communication is strictly prohibited. If you have
>> >> received this communication in error, please contact the sender
>> >>immediately
>> >> and delete it from your system. Thank You.
>> >>
>> >>
>>
>>

Re: Packaging new apps

2015-05-15 Thread Gour Saha

I might be wrong, but I sense there is a requirement here, where Slider
needs to accept custom application specific config files in it¹s original
raw format (like properties, xml, json, yaml, etc.) in addition to
appConfig.json. Then it is expected to merge them with appConfig.json and
send the complete property bag down to the application containers.

If that is true, or even if I got it all wrong, it would be great if you
can file JIRAs for what you are looking for? It is good to have these kind
of gaps and ideas captured in JIRAs, so that we can make Slider better.


Siyuan,
The instance tag feature has been there since 0.60. Check
https://issues.apache.org/jira/browse/SLIDER-463.

-Gour

On 5/11/15, 10:41 PM, "Thomas Weise"  wrote:

>Jean,
>
>We pulled in your changes and added modifications on top of it. It appears
>we agree that we should not force the user to redefine the default values
>that ship with server.properties. Please see whether the properties merge
>as implemented works on your environment or not. If not, what is the
>Python
>version?
>
>We can find an alternative solution to in-place edit of server properties
>if and when needed. The file is an argument to the start script, hence we
>can do a copy before merge if necessary.
>
>Thomas
>
>
>On Mon, May 11, 2015 at 3:26 PM, hsy...@gmail.com 
>wrote:
>
>> Hi Jean,
>>
>> Thanks for the change, using instance tag(is it a new feature in the
>>latest
>> version? I didn't see it in the older slider versions) is a really good
>> idea.  it might be good for other's to have a template but not for
>>kafka.
>> Kafka is evolving in quite fast pace. I've seen many property key/val
>> change in last several releases. Our method is keep most properties
>>default
>> and only override the one declared in appConfig.json which is actually
>> supported in current python script(maybe need some change for the latest
>> slider).
>>
>> And  Kafka broker is bundled with local disk once it's launched so in
>>the
>> real world there would be at most one instance for each NM.
>>
>> Best,
>> Siyuan
>>
>>
>>
>> On Mon, May 11, 2015 at 10:16 AM, Jean-Baptiste Note 
>> wrote:
>>
>> > Hi Thomas,
>> >
>> > According to kafka's documentation:
>> > http://kafka.apache.org/07/configuration.html there should be a
>>default
>> > value for any added property; I would expect the provided
>> server.properties
>> > file to actually reflect those default values.
>> > Therefore, I'd look twice before overconstraining the problem, and
>>would
>> > just generate the file for those and only those dictionary values that
>> have
>> > been set in the appConfig (which currently, my code does not, it
>> configures
>> > too many properties statically, but it can be arranged), relying on
>>the
>> > default properties for the rest.
>> >
>> > If there's really a case to have all properties at hand, I could:
>> > * parse the properties file provided in the tarball
>> > * re-generate the whole conf file with the parsed + overrides
>> >
>> > This, in order to allow for *added* properties (which the current
>> schemes,
>> > either mine or yours, does not look to allow) AND ultimately, allow
>>for
>> the
>> > whole tarball installation to be switched to read-only (which could
>>allow
>> > them to be shared among instances running on the same NM; I don't
>>know if
>> > slider currently does this kind of optimization).
>> >
>> > Maybe guidance from people more familiar with slider than us would be
>> > needed here :)
>> >
>> > Kind regards,
>> > JB
>> >
>>

Re: slider may report

2015-05-15 Thread Gour Saha

+1 from me with some additional points -

For the section "How has the community developed since the last report?", can 
we modify it as -

We are seeing a lot of traction in the community, over the improvement in 
deployment and management of applications using Slider. Good number of emails 
are pouring in the dev list, primarily on creation of Slider packages for newer 
applications and willingness to contribute code. Few presentations and meetups 
were organized by the community, socializing their Slider-ized applications. 
Occasional patches are being submitted and a number of features added to the 
recent releases were driven by user needs.

If the above paragraph is acceptable, then can we get rid of point 2 in "Three 
most important issues to address in the move towards graduation"? If not, then 
can we at least modify it to be just -
2. Achieving broader adoption of the existing code

For the section, "How has the project developed since the last report?" can we 
add at the end -

We completed release 0.61.0-incubating in February with more complex release 
artifacts than we had released previously. We also released 0.70.1-incubating 
in March. Planning is under way for releasing 0.80 in May 2015, with 64 JIRAs 
resolved and critical business application features like rolling upgrade, 
co-processor support for dynamic addition of libraries, and support for Docker 
containers.

-Gour

On 5/6/15, 6:23 AM, "Ted Yu" mailto:yuzhih...@gmail.com>> 
wrote:

+1

On Wed, May 6, 2015 at 6:05 AM, Jon Maron 
mailto:jma...@hortonworks.com>> wrote:

+1

> On May 6, 2015, at 8:58 AM, Billie Rinaldi 
> mailto:billie.rina...@gmail.com>>
wrote:
>
> How does this draft look?
>
> Slider
>
> Slider is a collection of tools and technologies to package, deploy, and
> manage long running applications on Apache Hadoop YARN clusters.
>
> Slider has been incubating since 2014-04-29.
>
> Three most important issues to address in the move towards graduation:
>
>  1. Building a diverse developer and user community
>  2. Achieving broader adoption of the existing code and slider-deployable
> applications (examples: HBase, Accumulo)
>  3. Making slider better at deploying other applications, so improving
> takeup.
>
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> aware of?
>
>  None
>
> How has the community developed since the last report?
>
>  We are getting some questions from users on the list and an occasional
>  patch submitted.  A number of features added to the recent releases
>  were driven by user needs.
>
> How has the project developed since the last report?
>
>  We completed release 0.61.0-incubating in February with more complex
>  release artifacts than we had released previously.  We also released
>  0.70.1-incubating in March.  Planning is under way for releasing 0.80 in
>  May 2015.
>
> Date of last release:
>
>  2015-03-31: Slider 0.70.1-incubating
>  2015-02-19: Slider 0.61.0-incubating
>
> When were the last committers or PMC members elected?
>
>  2014-09-27: Gour Saha, committer and PPMC member

Re: slider registry --name cl1 --conf hbase --format xml cannot work with java 1.8

2015-05-15 Thread Gour Saha

Can you file a JIRA, please.

-Gour

On 5/15/15, 7:13 PM, "skaterQiang"  wrote:

>Hello guys:
>slider registry --name cl1 --conf hbase --format xml cannot work
>with java 1.8
>It will throw connect in progress error.
>https://bugs.openjdk.java.net/browse/JDK-8029127
>https://java.net/jira/browse/JERSEY-730
>
>
>I have checked code, it is due to jersey 1.9 cannot work with java 1.8
>And after I change the jdk to 1.7 which oracle does not support now, it
>works fine.
>And I have checked 0.6.0, 0.7.1 and 0.8 slider pom, we all use jersey 1.9.
>So I think we need to fix it.
>
>
>Thanks

Re: slider registry --name cl1 --conf hbase --format xml cannot work with java 1.8

2015-05-15 Thread Gour Saha

Thank you, we will look into this.

-Gour

On 5/15/15, 8:36 PM, "xqfly...@163.com"  wrote:

>logged slider-878
>
>
>
>On 2015-05-16 10:32 , Gour Saha Wrote:
>
>Can you file a JIRA, please.
>
>-Gour
>
>On 5/15/15, 7:13 PM, "skaterQiang"  wrote:
>
>>Hello guys:
>>slider registry --name cl1 --conf hbase --format xml cannot work
>>with java 1.8
>>It will throw connect in progress error.
>>https://bugs.openjdk.java.net/browse/JDK-8029127
>>https://java.net/jira/browse/JERSEY-730
>>
>>
>>I have checked code, it is due to jersey 1.9 cannot work with java 1.8
>>And after I change the jdk to 1.7 which oracle does not support now, it
>>works fine.
>>And I have checked 0.6.0, 0.7.1 and 0.8 slider pom, we all use jersey
>>1.9.
>>So I think we need to fix it.
>>
>>
>>Thanks
>

Re: Slider stop not working

2015-05-17 Thread Gour Saha

Chackra,
This is wonderful. Thanks for debugging this and finding a solution. 

Now that we know it works for Debian, we have to check if this modified syntax 
will work for other OSes? If yes, then we can make a machine independent 
change. Otherwise we have to make OS specific code change. 

I would suggest you submit a patch for this change on hadoop trunk branch. You 
should get credit for it. 

-Gour

- Sent from my iPhone

> On May 17, 2015, at 11:05 AM, "Chackravarthy Esakkimuthu" 
>  wrote:
> 
> Gour/Steve,
> 
> The issue was because of improper kill command construction by
> DefaultContainerExecutor, and hence kill SIGTERM itself was not issued to
> SliderAgent, hence all the agents as well as components continue to run.
> 
> I made one change in Shell.java (hadoop-common)  to construct the kill
> command including two hyphens, then now slider stop works properly:)
> 
> It was, *kill -signalNo -*
> changed as,   *kill -signalNo -- -*
> 
> I have update the same in JIRA as well,
> 
> https://issues.apache.org/jira/browse/YARN-3561
> 
> 
> Thanks,
> Chackra
> 
> 
> On Thu, May 14, 2015 at 1:03 PM, Chackravarthy Esakkimuthu <
> chaku.mi...@gmail.com> wrote:
> 
>> sure Gour, would like to take up this task and contribute. Thanks for the
>> pointers for me to proceed with, I will get in touch with you incase If I
>> need any more help.
>> 
>> And wrt kill -s TERM on main.py processes (tried on both parent and child
>> process independently), please find the result as follows :
>> 
>> In none of the cases, application was killed.
>> 
>> *1) Slider app created, and its running (not stopped)*
>> 
>> *1.1) kill 'bash main.py' process*
>> 
>>   -  it killed both 'bash main.py' and its 'child main.py' process
>>   -  but the application process (nimbus) still running
>> 
>> 
>> SliderAgent.log :
>> 
>> *INFO 2015-05-14 12:17:06,591 Controller.py:497 - Component states
>> (result): Expected: 4 and Actual: 5*
>> *ERROR 2015-05-14 12:17:06,596 Controller.py:289 - Got terminateAgent
>> command*
>> *INFO 2015-05-14 12:17:16,597 Controller.py:217 - Terminate agent command
>> received from AM, stopping the agent ...*
>> *INFO 2015-05-14 12:17:16,597 ProcessHelper.py:39 - Removing pid file*
>> *WARNING 2015-05-14 12:17:16,598 ProcessHelper.py:44 - Unable to remove
>> pid file: [Errno 2] No such file or directory:
>> '/grid/4/hadoop/yarn/local/usercache/yarn/appcache/application_1431424102217_0003/container_1431424102217_0003_01_02/infra/run/agent.pid'*
>> *INFO 2015-05-14 12:17:16,598 ProcessHelper.py:46 - Removing temp files*
>> *1.2) kill 'child main.py' process*
>> 
>>   - it also killed both 'bash main.py' and its 'child main.py' process
>>   - but the application process (nimbus) still running
>> 
>> 
>> SliderAgent.log :
>> 
>> *INFO 2015-05-14 12:25:37,990 main.py:56 - signal received, exiting.*
>> *INFO 2015-05-14 12:25:37,990 ProcessHelper.py:39 - Removing pid file*
>> *INFO 2015-05-14 12:25:37,990 ProcessHelper.py:46 - Removing temp files*
>> 
>> 
>> *2) Slider app created, and its stopped.*
>> 
>> *2.1) kill 'bash main.py' process*
>> 
>>   - it killed only 'bash main.py' and not 'child main.py' process
>>   - And application process (nimbus) still running
>>   - there is *no logs came in SliderAgent*
>>   - And container logs are completely cleared by the time this action is
>>   done
>> 
>> *2.2) kill 'child main.py' process*
>> 
>>   - it killed both 'bash main.py' and its 'child main.py' process
>>   - And application process (nimbus) still running
>>   - And container logs are completely cleared by the time this action is
>>   done
>> 
>> 
>> SliderAgent.log :
>> 
>> *INFO 2015-05-14 12:48:25,589 main.py:56 - signal received, exiting.*
>> *INFO 2015-05-14 12:48:25,589 ProcessHelper.py:39 - Removing pid file*
>> *INFO 2015-05-14 12:48:25,590 ProcessHelper.py:46 - Removing temp files*
>> 
>> 
>>> On Thu, May 14, 2015 at 1:38 AM, Gour Saha  wrote:
>>> 
>>> Hi Chackra,
>>> 
>>> You are absolutely right. The workaround that I was planning to work on,
>>> should be implemented as a neat backup solution, when YARN fails to
>>> shutdown containers (in this and certain other possible scenarios).
>>> 
>>> In fact, we had filed a bug long time back along the same lines,
>>> predicting this issue

Re: slider may report

2015-05-17 Thread Gour Saha

Absolutely. We also got sign off from Slider's mentor Mahadev.

-Gour
 

On 5/16/15, 1:53 PM, "Billie Rinaldi"  wrote:

>Hi Gour,
>
>The May report has already shipped, but this sounds like a good start for
>next time.  :-)
>
>Billie
>
>On Fri, May 15, 2015 at 2:45 PM, Gour Saha  wrote:
>
>> +1 from me with some additional points -
>>
>> For the section "How has the community developed since the last
>>report?",
>> can we modify it as -
>>
>> We are seeing a lot of traction in the community, over the improvement
>>in
>> deployment and management of applications using Slider. Good number of
>> emails are pouring in the dev list, primarily on creation of Slider
>> packages for newer applications and willingness to contribute code. Few
>> presentations and meetups were organized by the community, socializing
>> their Slider-ized applications. Occasional patches are being submitted
>>and
>> a number of features added to the recent releases were driven by user
>>needs.
>>
>> If the above paragraph is acceptable, then can we get rid of point 2 in
>> "Three most important issues to address in the move towards
>>graduation"? If
>> not, then can we at least modify it to be just -
>> 2. Achieving broader adoption of the existing code
>>
>> For the section, "How has the project developed since the last report?"
>> can we add at the end -
>>
>> We completed release 0.61.0-incubating in February with more complex
>> release artifacts than we had released previously. We also released
>> 0.70.1-incubating in March. Planning is under way for releasing 0.80 in
>>May
>> 2015, with 64 JIRAs resolved and critical business application features
>> like rolling upgrade, co-processor support for dynamic addition of
>> libraries, and support for Docker containers.
>>
>>
>> -Gour
>>
>> On 5/6/15, 6:23 AM, "Ted Yu" > yuzhih...@gmail.com>> wrote:
>>
>> +1
>>
>> On Wed, May 6, 2015 at 6:05 AM, Jon Maron
>>> jma...@hortonworks.com>> wrote:
>>
>> +1
>>
>> > On May 6, 2015, at 8:58 AM, Billie Rinaldi > <mailto:billie.rina...@gmail.com>>
>> wrote:
>> >
>> > How does this draft look?
>> >
>> > Slider
>> >
>> > Slider is a collection of tools and technologies to package, deploy,
>>and
>> > manage long running applications on Apache Hadoop YARN clusters.
>> >
>> > Slider has been incubating since 2014-04-29.
>> >
>> > Three most important issues to address in the move towards graduation:
>> >
>> >  1. Building a diverse developer and user community
>> >  2. Achieving broader adoption of the existing code and
>>slider-deployable
>> > applications (examples: HBase, Accumulo)
>> >  3. Making slider better at deploying other applications, so improving
>> > takeup.
>> >
>> > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
>> > aware of?
>> >
>> >  None
>> >
>> > How has the community developed since the last report?
>> >
>> >  We are getting some questions from users on the list and an
>>occasional
>> >  patch submitted.  A number of features added to the recent releases
>> >  were driven by user needs.
>> >
>> > How has the project developed since the last report?
>> >
>> >  We completed release 0.61.0-incubating in February with more complex
>> >  release artifacts than we had released previously.  We also released
>> >  0.70.1-incubating in March.  Planning is under way for releasing
>>0.80 in
>> >  May 2015.
>> >
>> > Date of last release:
>> >
>> >  2015-03-31: Slider 0.70.1-incubating
>> >  2015-02-19: Slider 0.61.0-incubating
>> >
>> > When were the last committers or PMC members elected?
>> >
>> >  2014-09-27: Gour Saha, committer and PPMC member
>>
>>
>>
>>

[ANNOUNCE] Apache Slider 0.80.0-incubating release

2015-05-19 Thread Gour Saha

The Apache Slider team is proud to announce Apache Slider incubation release
version 0.80.0-incubating.

Apache Slider (incubating) is a YARN application which deploys existing
distributed applications on YARN, monitors them, and makes them larger or
smaller as desired - even while the application is running.

The release artifacts are available at:
http://www.apache.org/dyn/closer.cgi/incubator/slider/0.80.0-incubating

To use these artifacts, please use the following documentation:
http://slider.incubator.apache.org/docs/getting_started.html

Release notes available at:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315422&ve
rsion=12329384

We would like to thank all the contributors that made the release possible.

Regards,
The Slider Team

-

DISCLAIMER

Apache Slider is an effort undergoing incubation at The Apache Software
Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further review
indicates that the infrastructure, communications,
and decision making process have stabilized in a manner consistent with
other successful ASF projects. While incubation status
is not necessarily a reflection of the completeness or stability of the
code, it does indicate that the project has yet to be fully
endorsed by the ASF.

Re: Can't get exports to work ???

2015-05-19 Thread Gour Saha

What is the exact command you ran?

Can you try this -
slider registry --name  --getexp servers

-Gour

On 5/19/15, 9:26 AM, "Timothy Potter"  wrote:

>I should note this is with a 0.72.0 build
>
>On Tue, May 19, 2015 at 10:25 AM, Timothy Potter 
>wrote:
>> I have defined a simple export in metainfo.xml:
>>
>> 
>>   
>> Servers
>> 
>>   
>> host_port
>> http://${THIS_HOST}:${site.global.listen_port}/
>>   
>> 
>>   
>> 
>>
>> 
>>   
>> SOLR
>> SLAVE
>> Servers-host_port
>> 
>>   scripts/solr_node.py
>>   PYTHON
>> 
>>   
>> 
>>
>> The app deploys fine and my SOLR component is running ... but when I
>> go to the exports endpoint, I don't see my host_port???
>>
>> 
>>{"exports":{"servers":{"description":"Servers","updated":1432052275319,"u
>>pdatedTime":"Tue
>> May 19 10:17:55 MDT
>> 
>>2015","entries":{},"empty":true},"container_log_dirs":{"description":"con
>>tainer_log_dirs","updated":1432052275319,"updatedTime":"Tue
>> May 19 10:17:55 MDT
>> 
>>2015","entries":{},"empty":true},"container_work_dirs":{"description":"co
>>ntainer_work_dirs","updated":1432052275319,"updatedTime":"Tue
>> May 19 10:17:55 MDT 2015","entries":{},"empty":true}}}
>>
>> Any ideas what I'm doing wrong here?

Re: Can't get exports to work ???

2015-05-19 Thread Gour Saha

I think you are using Java 8, so you are hitting -
https://issues.apache.org/jira/browse/SLIDER-878

We are working on it. Meanwhile you can use Java 7 and the cmd will work.

You can also try to hit the AM URL and fetch the exports. It should be
something like -

http://:/ws/v1/slider/publisher/slider/servers

The Application Master link should be available from RM UI page as well -
http://:8088/cluster

-Gour

On 5/19/15, 11:02 AM, "Timothy Potter"  wrote:

>I get this:
>
>[~/dev/lw/slider/slider-0.72.0-incubating-SNAPSHOT]$ bin/slider
>registry --name solr --getexp Servers --manager localhost:8032
>2015-05-19 12:02:07,841 [main] INFO  client.RMProxy - Connecting to
>ResourceManager at localhost/127.0.0.1:8032
>Exception: java.lang.IllegalStateException: connect in progress
>2015-05-19 12:02:09,407 [main] ERROR main.ServiceLauncher - Exception:
>java.lang.IllegalStateException: connect in progress
>com.sun.jersey.api.client.ClientHandlerException:
>java.lang.IllegalStateException: connect in progress
>at 
>com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLC
>onnectionClientHandler.java:149)
>at com.sun.jersey.api.client.Client.handle(Client.java:648)
>at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>at com.sun.jersey.api.client.WebResource.get(WebResource.java:191)
>at 
>org.apache.slider.core.registry.retrieve.RegistryRetriever.getExports(Regi
>stryRetriever.java:137)
>at 
>org.apache.slider.client.SliderClient.actionRegistryGetExport(SliderClient
>.java:3630)
>at 
>org.apache.slider.client.SliderClient.actionRegistry(SliderClient.java:310
>8)
>at org.apache.slider.client.SliderClient.exec(SliderClient.java:472)
>at 
>org.apache.slider.client.SliderClient.runService(SliderClient.java:374)
>at 
>org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.
>java:188)
>at 
>org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceL
>auncher.java:475)
>at 
>org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLa
>uncher.java:403)
>at 
>org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.ja
>va:630)
>at org.apache.slider.Slider.main(Slider.java:49)
>Caused by: java.lang.IllegalStateException: connect in progress
>at 
>sun.net.www.protocol.http.HttpURLConnection.setRequestMethod(HttpURLConnec
>tion.java:515)
>at 
>com.sun.jersey.client.urlconnection.URLConnectionClientHandler.setRequestM
>ethodUsingWorkaroundForJREBug(URLConnectionClientHandler.java:259)
>at 
>com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URL
>ConnectionClientHandler.java:191)
>at 
>com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLC
>onnectionClientHandler.java:147)
>... 13 more
>2015-05-19 12:02:09,410 [main] INFO  util.ExitUtil - Exiting with status
>56
>
>On Tue, May 19, 2015 at 11:23 AM, Gour Saha  wrote:
>> What is the exact command you ran?
>>
>> Can you try this -
>> slider registry --name  --getexp servers
>>
>> -Gour
>>
>> On 5/19/15, 9:26 AM, "Timothy Potter"  wrote:
>>
>>>I should note this is with a 0.72.0 build
>>>
>>>On Tue, May 19, 2015 at 10:25 AM, Timothy Potter 
>>>wrote:
>>>> I have defined a simple export in metainfo.xml:
>>>>
>>>> 
>>>>   
>>>> Servers
>>>> 
>>>>   
>>>> host_port
>>>> http://${THIS_HOST}:${site.global.listen_port}/
>>>>   
>>>> 
>>>>   
>>>> 
>>>>
>>>> 
>>>>   
>>>> SOLR
>>>> SLAVE
>>>> Servers-host_port
>>>> 
>>>>   scripts/solr_node.py
>>>>   PYTHON
>>>> 
>>>>   
>>>> 
>>>>
>>>> The app deploys fine and my SOLR component is running ... but when I
>>>> go to the exports endpoint, I don't see my host_port???
>>>>
>>>>
>>>>{"exports":{"servers":{"description":"Servers","updated":1432052275319,
>>>>"u
>>>>pdatedTime":"Tue
>>>> May 19 10:17:55 MDT
>>>>
>>>>2015","entries":{},"empty":true},"container_log_dirs":{"description":"c
>>>>on
>>>>tainer_log_dirs","updated":1432052275319,"updatedTime":"Tue
>>>> May 19 10:17:55 MDT
>>>>
>>>>2015","entries":{},"empty":true},"container_work_dirs":{"description":"
>>>>co
>>>>ntainer_work_dirs","updated":1432052275319,"updatedTime":"Tue
>>>> May 19 10:17:55 MDT 2015","entries":{},"empty":true}}}
>>>>
>>>> Any ideas what I'm doing wrong here?
>>

Re: Weird errors when trying to stop my slider app

2015-05-19 Thread Gour Saha

Can you send me the Slider AM logs as well? It should be the logs in the
container with id container_1432005178704_0014_01_01.

Also,
Before you issued the stop command, was Solr up and running and in good
health?

-Gour

On 5/19/15, 2:37 PM, "Timothy Potter"  wrote:

>Using 0.72.0 build ...
>
>I deploy my app successfully, but when I try to stop it using:
>
>bin/slider stop solr
>
>It doesn't look like my stop python method is ever called and the
>underlying Solr process is not stopped. In the slider-agent.log, I see
>this:
>
>INFO 2015-05-19 15:35:41,571 security.py:132 - Encountered
>communication error. Details: BadStatusLine("''",)
>ERROR 2015-05-19 15:35:41,571 Controller.py:562 - Exception raised
>Traceback (most recent call last):
>  File 
>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache/a
>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-agen
>t/agent/Controller.py",
>line 558, in sendRequest
>  File 
>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache/a
>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-agen
>t/agent/security.py",
>line 134, in request
>IOError: Error occured during connecting to the server: ''
>WARNING 2015-05-19 15:35:41,571 Controller.py:565 - Request failed!
>Data: {"nodeStatus": {"status": "HEALTHY", "cause": "NONE"},
>"timestamp": 1432071341566, "hostname":
>"container_1432005178704_0014_01_02___SOLR", "responseId": 46,
>"fqdn": "Lucids-MacBook-Pro.local", "reports": []}
>ERROR 2015-05-19 15:35:45,575 Controller.py:374 - Unable to connect
>to: 
>https://Lucids-MacBook-Pro.local:52672/ws/v1/slider/agents/container_14320
>05178704_0014_01_02___SOLR/heartbeat
>due to expected string or buffer
>ERROR 2015-05-19 15:35:45,575 Controller.py:384 - Heartbeat retry count =
>1
>INFO 2015-05-19 15:35:55,584 security.py:89 - SSL Connect being
>called.. connecting to the server
>ERROR 2015-05-19 15:35:55,586 Controller.py:562 - Exception raised
>Traceback (most recent call last):
>  File 
>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache/a
>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-agen
>t/agent/Controller.py",
>line 556, in sendRequest
>  File 
>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache/a
>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-agen
>t/agent/security.py",
>line 106, in __init__
>  File 
>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache/a
>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-agen
>t/agent/security.py",
>line 111, in connect
>  File 
>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache/a
>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-agen
>t/agent/security.py",
>line 49, in connect
>  File 
>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache/a
>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-agen
>t/agent/security.py",
>line 90, in create_connection
>  File 
>"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/so
>cket.py",
>line 571, in create_connection
>raise err
>error: [Errno 61] Connection refused
>WARNING 2015-05-19 15:35:55,587 Controller.py:565 - Request failed!
>Data: {"nodeStatus": {"status": "HEALTHY", "cause": "NONE"},
>"timestamp": 1432071341566, "hostname":
>"container_1432005178704_0014_01_02___SOLR", "responseId": 46,
>"fqdn": "Lucids-MacBook-Pro.local", "reports": []}

Re: Container recovery on working on CDH with yarn.component.placement.policy=1

2015-05-20 Thread Gour Saha

Thomas,
Resources.json looks ok. Can you send me the following logs so that I can
look further into it -

- Slider AM log
- Slider agent log (for the container that was killed)
- RM log
- NM log from the node where Slider agent (that was killed) was running

-Gour

On 5/19/15, 8:43 AM, "Thomas Weise"  wrote:

>All resources are freed up. The AM requests the replacement container and
>nothing happens after that. Please see:
>
>https://www.dropbox.com/sh/8ub0jedh60cgys4/AACPftofPcdhD5Sb2XADRMTga?dl=0
>
>resources.json
>
>{
>  "schema" : "http://example.org/specification/v2.0.0";,
>  "metadata" : {
>  },
>  "global" : {
>"yarn.container.failure.threshold":"10",
>"yarn.container.failure.window.hours":"1"
>  },
>  "components" : {
>"broker" : {
>  "yarn.role.priority" : "1",
>  "yarn.component.instances" : "3",
>  "yarn.memory" : "768",
>  "yarn.vcores" : "1",
>  "yarn.component.placement.policy":"1"
>},
>"slider-appmaster" : {
>}
>  }
>}
>
>
>On Wed, May 13, 2015 at 5:03 PM, Gour Saha  wrote:
>
>> Can you check the resources (memory, cpu) available in the host, after
>> killing the container? Is it freed? Can you hit the RM UI and share what
>> you see in the ³Cluster Metrics² table for that node?
>>
>> Also, if possible please share your resources.json.
>>
>> -Gour
>>
>> On 5/12/15, 9:34 AM, "Thomas Weise"  wrote:
>>
>> >We are testing KOYA on CDH 5.4. We see that after killing the container
>> >Slider as expected will ask for the same host. The request is never
>>filled
>> >and the container cannot be redeployed. We see this behavior on CDH
>>with
>> >DataTorrent also, it looks like a CDH bug.
>> >
>> >Anyone else trying to run Slider on CDH and sees the same behavior? Any
>> >insight on whether that is a CDH configuration issue or fair scheduler
>> >bug?
>> >
>> >Thanks,
>> >Thomas
>>
>>

Re: Weird errors when trying to stop my slider app

2015-05-22 Thread Gour Saha

gt;impl.ContainerManagementProtocolProxy - Opening proxy :
>192.168.1.3:52525
>2015-05-19 15:35:32,179 [AmExecutor-005] INFO  actions.QueueService -
>QueueService processor terminated
>2015-05-19 15:35:32,179 [AmExecutor-006] WARN  actions.ActionStopQueue -
>STOP
>2015-05-19 15:35:32,179 [AmExecutor-006] INFO  actions.QueueExecutor -
>Queue Executor run() stopped
>2015-05-19 15:35:32,179 [AMRM Callback Handler Thread] INFO
>impl.AMRMClientAsyncImpl - Interrupted while waiting for queue
>java.lang.InterruptedException
>at 
>java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.repo
>rtInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>at 
>java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awai
>t(AbstractQueuedSynchronizer.java:2052)
>at 
>java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442
>)
>at 
>org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackH
>andlerThread.run(AMRMClientAsyncImpl.java:274)
>
>On Tue, May 19, 2015 at 5:37 PM, Gour Saha  wrote:
>> Can you send me the Slider AM logs as well? It should be the logs in the
>> container with id container_1432005178704_0014_01_01.
>>
>> Also,
>> Before you issued the stop command, was Solr up and running and in good
>> health?
>>
>> -Gour
>>
>> On 5/19/15, 2:37 PM, "Timothy Potter"  wrote:
>>
>>>Using 0.72.0 build ...
>>>
>>>I deploy my app successfully, but when I try to stop it using:
>>>
>>>bin/slider stop solr
>>>
>>>It doesn't look like my stop python method is ever called and the
>>>underlying Solr process is not stopped. In the slider-agent.log, I see
>>>this:
>>>
>>>INFO 2015-05-19 15:35:41,571 security.py:132 - Encountered
>>>communication error. Details: BadStatusLine("''",)
>>>ERROR 2015-05-19 15:35:41,571 Controller.py:562 - Exception raised
>>>Traceback (most recent call last):
>>>  File
>>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache
>>>/a
>>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag
>>>en
>>>t/agent/Controller.py",
>>>line 558, in sendRequest
>>>  File
>>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache
>>>/a
>>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag
>>>en
>>>t/agent/security.py",
>>>line 134, in request
>>>IOError: Error occured during connecting to the server: ''
>>>WARNING 2015-05-19 15:35:41,571 Controller.py:565 - Request failed!
>>>Data: {"nodeStatus": {"status": "HEALTHY", "cause": "NONE"},
>>>"timestamp": 1432071341566, "hostname":
>>>"container_1432005178704_0014_01_02___SOLR", "responseId": 46,
>>>"fqdn": "Lucids-MacBook-Pro.local", "reports": []}
>>>ERROR 2015-05-19 15:35:45,575 Controller.py:374 - Unable to connect
>>>to:
>>>https://Lucids-MacBook-Pro.local:52672/ws/v1/slider/agents/container_143
>>>20
>>>05178704_0014_01_02___SOLR/heartbeat
>>>due to expected string or buffer
>>>ERROR 2015-05-19 15:35:45,575 Controller.py:384 - Heartbeat retry count
>>>=
>>>1
>>>INFO 2015-05-19 15:35:55,584 security.py:89 - SSL Connect being
>>>called.. connecting to the server
>>>ERROR 2015-05-19 15:35:55,586 Controller.py:562 - Exception raised
>>>Traceback (most recent call last):
>>>  File
>>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache
>>>/a
>>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag
>>>en
>>>t/agent/Controller.py",
>>>line 556, in sendRequest
>>>  File
>>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache
>>>/a
>>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag
>>>en
>>>t/agent/security.py",
>>>line 106, in __init__
>>>  File
>>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache
>>>/a
>>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag
>>>en
>>>t/agent/security.py",
>>>line 111, in connect
>>>  File
>>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache
>>>/a
>>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag
>>>en
>>>t/agent/security.py",
>>>line 49, in connect
>>>  File
>>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache
>>>/a
>>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag
>>>en
>>>t/agent/security.py",
>>>line 90, in create_connection
>>>  File
>>>"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
>>>so
>>>cket.py",
>>>line 571, in create_connection
>>>raise err
>>>error: [Errno 61] Connection refused
>>>WARNING 2015-05-19 15:35:55,587 Controller.py:565 - Request failed!
>>>Data: {"nodeStatus": {"status": "HEALTHY", "cause": "NONE"},
>>>"timestamp": 1432071341566, "hostname":
>>>"container_1432005178704_0014_01_02___SOLR", "responseId": 46,
>>>"fqdn": "Lucids-MacBook-Pro.local", "reports": []}
>>

Re: Could slider create 2+ hbase clusters in one yarn cluster?

2015-05-22 Thread Gour Saha

Yes it is supported. 

Typically this happens when there is insufficient resources in your cluster. 

Can you send your cluster specs (# of nodes, memory, vcore) and the hbase 
resources.json?

-Gour

- Sent from my iPhone

> On May 22, 2015, at 8:40 PM, "skaterQiang"  wrote:
> 
> Hello guys:
> Could slider create 2+ hbase clusters in one yarn cluster?
> I have tried with slider 0.6, when the 2nd hbase cluster submited, its 
> appmaster will be blocked and in unassigned status.
> Only when the 1st hbase cluster stop, and then 2nd one can be started.
> 
> 
> Regards,
> Qiang

Re: Could slider create 2+ hbase clusters in one yarn cluster?

2015-05-23 Thread Gour Saha

You can create multiple instances with the same app package, as long as you 
give different names. 

-Gour

- Sent from my iPhone

> On May 22, 2015, at 11:22 PM, "xqfly...@163.com"  wrote:
> 
> i found the 2nd one start with the first cluster.
> is it due to i use the same hbase package?
> how to make two clusters? is there some examples for create two hbase 
> clusters?
> 
> thanks
> 
> 
> 
> On 2015-05-23 13:19 , Gour Saha Wrote:
> 
> Yes it is supported.
> 
> Typically this happens when there is insufficient resources in your cluster.
> 
> Can you send your cluster specs (# of nodes, memory, vcore) and the hbase 
> resources.json?
> 
> -Gour
> 
> - Sent from my iPhone
> 
>> On May 22, 2015, at 8:40 PM, "skaterQiang"  wrote:
>> 
>> Hello guys:
>>Could slider create 2+ hbase clusters in one yarn cluster?
>> I have tried with slider 0.6, when the 2nd hbase cluster submited, its 
>> appmaster will be blocked and in unassigned status.
>> Only when the 1st hbase cluster stop, and then 2nd one can be started.
>> 
>> 
>> Regards,
>> Qiang

Re: Interest in including a package for Solr in Slider?

2015-05-27 Thread Gour Saha

Yes, please file a JIRA and submit as a patch. I have already tested the
package on CentOS6. Everything looks good, except that I am unable to see
³zkhost² in servers export. I see ³host_port² only.

Couple more points:
- Is this package platform independent? If not, do you plan to submit a
separate patch for Windows? If yes, would be great if it could be
mentioned in the README.
- Please ensure Apache license headers are added to all files including
README.md (refer to files under
https://github.com/apache/incubator-slider/blob/develop/app-packages/hbase
as example). I think the json files can be skipped.
- Remove the LICENSE file from the top-level dir

-Gour

On 5/27/15, 7:50 AM, "Ted Yu"  wrote:

>Please open a JIRA.
>
>Thanks
>
>On Wed, May 27, 2015 at 6:24 AM, Timothy Potter 
>wrote:
>
>> Hi,
>>
>> I'm a committer on the Lucene/Solr project and have developed a basic
>> solution for deploying Solr to YARN using Slider, see:
>> https://github.com/LucidWorks/solr-slider
>>
>> I'm reaching out to see if there is interest in including Solr as one
>> of the packages that ships with Slider, as you do today with Storm,
>> memcached, HBase, etc ...
>>
>> If so, I can open a JIRA and prepare a patch based on my github
>> project, but wanted to gauge interest before going down that path.
>>
>> Cheers,
>> Timothy Potter
>>

Re: Not able to build fresh checkout from git?

2015-05-27 Thread Gour Saha

hadoop 2.6.0-SNAPSHOT was released long back, so snapshot reference is
incorrect.

Slider master branch will not be updated going forward.

As Billie suggested, you can use the latest release 0.80.0-incubating (or
develop).

-Gour

On 5/27/15, 12:19 PM, "Billie Rinaldi"  wrote:

>The master branch has gotten a bit stale.  I'd suggest checking out the
>develop branch, or the latest release tag.
>
>On Wed, May 27, 2015 at 12:09 PM, Timothy Potter 
>wrote:
>
>> Just pulled a fresh copy and am getting this:
>>
>> [ERROR] Failed to execute goal on project slider-core: Could not
>> resolve dependencies for project
>> org.apache.slider:slider-core:jar:0.60.0-incubating: The following
>> artifacts could not be resolved:
>> org.apache.hadoop:hadoop-client:jar:2.6.0-SNAPSHOT,
>> org.apache.hadoop:hadoop-yarn-client:jar:2.6.0-SNAPSHOT,
>> org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:2.6.0-SNAPSHOT,
>> org.apache.hadoop:hadoop-yarn-registry:jar:2.6.0-SNAPSHOT,
>> org.apache.hadoop:hadoop-minicluster:jar:2.6.0-SNAPSHOT: Could not
>> find artifact org.apache.hadoop:hadoop-client:jar:2.6.0-SNAPSHOT in
>> ASF Staging (https://repository.apache.org/content/groups/staging/) ->
>> [Help 1]
>>
>> I tried mvn -U clean package too ... no luck
>>

Re: Confused about exports

2015-05-29 Thread Gour Saha

It can be considered as an enhancement, but this is what happens today.
The exports are displayed only if they have one of the 2 random aspects in
them.

They should have, either an allocated port or an assigned host. For ex:
the variables ${SOLR_HOST} and ${site.global.listen_port} are both
dynamically allocated. The zk_host variable has a hard coded value and
hence is not considered to be export worthy.

Hardcoded values can always be extracted by running slider status 

But I think, if exporting hard coded values helps an application, then we
should consider exporting them as well. Feel free to file a JIRA.

-Gour


On 5/28/15, 9:28 AM, "Timothy Potter"  wrote:

>Yes, for sure (else Solr wouldn't even run):
>
>"site.global.zk_host": "localhost:2181",
>
>
>On Thu, May 28, 2015 at 10:26 AM, Sumit Mohanty
> wrote:
>> Do you have an entry such as
>>
>> "site.global.zk_host":"${ZK_HOST}",
>>
>> in the app config json provided to the app?
>> 
>> From: Timothy Potter 
>> Sent: Thursday, May 28, 2015 8:23 AM
>> To: dev@slider.incubator.apache.org
>> Subject: Confused about exports
>>
>> This is for the Solr package ... My goal is to expose the ZooKeeper
>> connection string used by SolrCloud in the registry so client
>> applications can look it up dynamically. Thus, in my metainfo.xml, I
>> have the following:
>>
>> 
>>   
>> Servers
>> 
>>   
>> host_port
>> http://${SOLR_HOST}:${site.global.listen_port}/
>>   
>>   
>> zkhost
>> ${site.global.zk_host}
>>   
>> 
>>   
>> 
>>
>> 
>>   
>> SOLR
>> SLAVE
>> Servers-host_port,Servers-zkhost
>> 
>>   scripts/solr_node.py
>>   PYTHON
>> 
>>   
>> 
>>
>> In the slider.log, I see:
>>
>> 2015-05-28 09:13:20,370 [1604725103@qtp-819364987-4] INFO
>> agent.AgentProviderService - Attempting to publish zkhost of group
>> Servers for component type SOLR
>> 2015-05-28 09:13:20,370 [1604725103@qtp-819364987-4] INFO
>> agent.AgentProviderService - publishing
>> PublishedConfiguration{description='Servers' entries = 1}
>> 2015-05-28 09:13:20,371 [1604725103@qtp-819364987-4] INFO
>> agent.AgentProviderService - Component operation. Status: COMPLETED;
>> new container state: HEALTHY; new component state: INSTALLED
>> 2015-05-28 09:13:20,371 [AmExecutor-006] INFO
>> appmaster.SliderAppMaster - Registering component
>> container_1432823188593_0003_01_03
>>
>> When I run the registry command for Solr, I only get the host_port
>> values, but not the ZooKeeper setting ... the only thing I can think
>> of is it is the same value for all SOLR components. Here's the output
>> from the registry command:
>>
>> 
>>[~/dev/lw/projects/incubator-slider/slider-assembly/target/slider-0.81.0-
>>incubating-SNAPSHOT]$
>> bin/slider registry --name solr --getexp servers
>> 2015-05-28 09:15:49,360 [main] INFO  client.RMProxy - Connecting to
>> ResourceManager at localhost/127.0.0.1:8032
>> {
>>   "host_port" : [ {
>> "value" : "http://Lucids-MacBook-Pro.local:49963/";,
>> "containerId" : "container_1432823188593_0003_01_03",
>> "tag" : "1",
>> "level" : "component",
>> "updatedTime" : "Thu May 28 09:13:20 MDT 2015"
>>   }, {
>> 2015-05-28 09:15:50,966 [main] INFO  util.ExitUtil - Exiting with
>>status 0
>> "value" : "http://Lucids-MacBook-Pro.local:49964/";,
>> "containerId" : "container_1432823188593_0003_01_02",
>> "tag" : "2",
>> "level" : "component",
>> "updatedTime" : "Thu May 28 09:13:20 MDT 2015"
>>   } ]
>> }
>>
>> This is with the latest develop branch.
>

Re: Need hostname for use in appConfig.json

2015-06-01 Thread Gour Saha

Have you tried using ${THIS_HOST} in appConfig? Did it not work?

-Gour

On 6/1/15, 9:14 AM, "Nathaniel Braun"  wrote:

>Hi everyone,
>
>We are currently working on the configuration files with Kerberos
>principals in them, and it turns out that the Kerberos principal is
>linked to the hostname, so we need it.
>
>What we would like to do is something like that:
>
>
>1.   In appConfig.json
>
>Set the global hostname: "site.global.hostname": "${THIS_HOST}"
>
>
>2.   In our default httpfs-site configuration file:
>
>Read that value using the following piece of code:
>
>httpfs.authentication.kerberos.principalHTTP/${@//site
>/global/hostname}
>httpfs.hadoop.authentication.kerberos.principalHTTP/${
>@//site/global/hostname}
>
>For this to work, we need the THIS_HOST variable to work in the
>appConfig.json file.
>
>How can we achieve such a feature?
>
>Thanks & regards,
>Nathaniel
>

Re: registry / export question

2015-06-01 Thread Gour Saha

There is a combination of ZK and REST way to find the info you are looking for.

Use a zk client and do this -

get /registry/users//services/org-apache-slider/

(with appropriate  and  of the koya cluster)

>From the json dump look for element with api = "class 
>path:org.apache.slider.publisher.exports" under "external" element. Get the 
>value of "addresses"->"uri" e.g.: 
>http://c6401.ambari.apache.org:1025/ws/v1/slider/publisher/exports

Then you can do -
curl 
"http://c6401.ambari.apache.org:1025/ws/v1/slider/publisher/exports/"

e.g.
curl 
"http://c6401.ambari.apache.org:1025/ws/v1/slider/publisher/exports/servers";

Does this help?

Check https://issues.apache.org/jira/browse/SLIDER-151 and 
https://issues.apache.org/jira/browse/YARN-913 for few things to look out for, 
in the future.

-Gour

On 6/1/15, 5:47 AM, "Jean-Baptiste Note" 
mailto:jbn...@gmail.com>> wrote:

Hi there,

I've successfully exported some host/port dynamic combination in slider for
Kafka on Yarn; they are made available under
publisher/exports/servers on the appmaster (see
https://github.com/jbnote/koya/).

I'm now trying to access this information (really, service location) in two
different ways:

* From within slider. Is there a public API that I could use directly in
python from other slider instances to get to this information ? -- this is
necessary for spawning Kafka mirroring from slider, for instance. From what
I can see in storm-slider, the slider binary is directly invoked.

* From the rest of the world. I was thinking of exporting the data to DNS,
and hoped to do this with a zookeeper-monitoring daemon, which is already
partially implemented. However, none of my exported data seems to be
present in ZK, which I was naively hoping for. Is there something i'm
missing ? I find the ZK way perfect, rather than the REST API which as far
as I can see will require polling. In python monitoring ZK is a breeze.

Can someone familiar with the design intent shed some light on how I should
carryout this ?

Kind regards,
JB

Re: registry / export question

2015-06-02 Thread Gour Saha

It is REST style uri, so if you append the uri path with the export group name 
you will get the info you are looking for. 

If that does not answer your question, can you give an example response that 
you are expecting to see?

-Gour

> On Jun 2, 2015, at 8:33 AM, "hsy...@gmail.com"  wrote:
> 
> I've noticed the http://hostname/ws/v1/slider/publisher/exports/
>   only
> gives you the list of export values, but within each one the entries block
> are empty. Is it ok have them all embedded in one response so that you can
> get all information directly from slider AM UI.
> 
> On Mon, Jun 1, 2015 at 12:35 PM, Jean-Baptiste Note 
> wrote:
> 
>> Thanks Gour,
>> 
>> Indeed it does help; because I can see a way to combine these to avoid
>> polling.
>> By monitoring the ZK registry and doing CURL whenever there's a child
>> change in the registry it looks I can reliably track changes in the export
>> group, so this is perfect.
>> I'll let you know how implementation goes :)
>> 
>> Kind regards,
>> JB
>>

Re: component restart by making config changes.

2015-06-07 Thread Gour Saha

Chackra,
If you are thinking of updating without doing a complete stop of your
application, you might want to look into rolling upgrade -
http://slider.incubator.apache.org/docs/slider_specs/application_pkg_upgrad
e.html


This feature is available from latest 0.80.0 release onwards.

-Gour

On 6/7/15, 4:16 AM, "Chackravarthy Esakkimuthu" 
wrote:

>I am able to do complete cluster restart with modified configs, by doing
>'slider update' , 'slider stop', 'slider start'.
>
>Suppose If I want to restart a particular component alone (say STORM
>SUPERVISOR) with the modified configs, how can I do it? Is there an option
>to stop/restart a particular component?

Re: Discussions on Slider

2015-06-08 Thread Gour Saha

Hi Anirban,

Applications that should be considered for deployment via Slider, should be 
intrinsically long running applications. Technically these applications should 
continue to run forever, unless the stop command is called.

Hence in your simple hello-world application you have to add an infinite wait 
statement after "hello world" is written to hdfs (something like "tail -f 
/dev/null" should work). Subsequently you can call the stop command to bring 
down all the N-instances.

Transient applications are not a good fit for Slider.

The memcached app-package is a very simple package and you can use it as a 
sample for your hello-world app -
https://github.com/apache/incubator-slider/tree/develop/app-packages/memcached

Also, we are constantly trying to improve site documentation, so if possible, 
would appreciate if you can file a doc bug with your inputs/feedback.

-Gour

From: Anirban Banerjee 
mailto:abaner...@rocketfuel.com>>
Date: Monday, June 8, 2015 at 1:28 PM
To: Gour Saha mailto:gs...@hortonworks.com>>, 
"dev@slider.incubator.apache.org<mailto:dev@slider.incubator.apache.org>" 
mailto:dev@slider.incubator.apache.org>>
Cc: Nitin Aggarwal 
mailto:naggar...@rocketfuelinc.com>>, Vinod Kumar 
Vavilapalli mailto:vino...@hortonworks.com>>, Rakesh 
Saha mailto:rs...@hortonworks.com>>
Subject: Re: Discussions on Slider

Adding the DL as per request.

~Anirban

On Mon, Jun 8, 2015 at 1:26 PM, Anirban Banerjee 
mailto:abaner...@rocketfuel.com>> wrote:
Hi Gour,

Thanks for offering to help.

I am trying to build a hello-world application (N instances, each echo hello 
world to hdfs file, and ends) like I described on the white board last Thursday.

The instructions at 
http://slider.incubator.apache.org/docs/getting_started.html were easy to 
follow upto a point. However, I got stuck at the sample application section 
because document suddenly switched from How-To mode to Reference-Lookup mode.

Do you have a simple tutorial/codelab that helps me to build hello-world 
(described above) from here.

If not, I will dig into the code...

Thanks,
Anirban

On Fri, Jun 5, 2015 at 5:55 PM, Gour Saha 
mailto:gs...@hortonworks.com>> wrote:
Anirban,
Feel free to reach out directly to me or the DL 
dev@slider.incubator.apache.org<mailto:dev@slider.incubator.apache.org>. The DL 
is preferable as there will be a larger team behind your questions.

-Gour

Re: Discussions on Slider

2015-06-08 Thread Gour Saha

This should also help -
http://slider.incubator.apache.org/docs/slider_specs/hello_world_slider_app.html

You can ignore the section "Add on package" and beyond in this doc.

-Gour

On 6/8/15, 2:14 PM, "Gour Saha" 
mailto:gs...@hortonworks.com>> wrote:

Hi Anirban,

Applications that should be considered for deployment via Slider, should be 
intrinsically long running applications. Technically these applications should 
continue to run forever, unless the stop command is called.

Hence in your simple hello-world application you have to add an infinite wait 
statement after "hello world" is written to hdfs (something like "tail -f 
/dev/null" should work). Subsequently you can call the stop command to bring 
down all the N-instances.

Transient applications are not a good fit for Slider.

The memcached app-package is a very simple package and you can use it as a 
sample for your hello-world app -
https://github.com/apache/incubator-slider/tree/develop/app-packages/memcached

Also, we are constantly trying to improve site documentation, so if possible, 
would appreciate if you can file a doc bug with your inputs/feedback.

-Gour

From: Anirban Banerjee 
mailto:abaner...@rocketfuel.com><mailto:abaner...@rocketfuel.com>>
Date: Monday, June 8, 2015 at 1:28 PM
To: Gour Saha 
mailto:gs...@hortonworks.com><mailto:gs...@hortonworks.com>>,
 
"dev@slider.incubator.apache.org<mailto:dev@slider.incubator.apache.org><mailto:dev@slider.incubator.apache.org>"
 
mailto:dev@slider.incubator.apache.org><mailto:dev@slider.incubator.apache.org>>
Cc: Nitin Aggarwal 
mailto:naggar...@rocketfuelinc.com><mailto:naggar...@rocketfuelinc.com>>,
 Vinod Kumar Vavilapalli 
mailto:vino...@hortonworks.com><mailto:vino...@hortonworks.com>>,
 Rakesh Saha 
mailto:rs...@hortonworks.com><mailto:rs...@hortonworks.com>>
Subject: Re: Discussions on Slider

Adding the DL as per request.

~Anirban

On Mon, Jun 8, 2015 at 1:26 PM, Anirban Banerjee 
mailto:abaner...@rocketfuel.com><mailto:abaner...@rocketfuel.com>>
 wrote:
Hi Gour,

Thanks for offering to help.

I am trying to build a hello-world application (N instances, each echo hello 
world to hdfs file, and ends) like I described on the white board last Thursday.

The instructions at 
http://slider.incubator.apache.org/docs/getting_started.html were easy to 
follow upto a point. However, I got stuck at the sample application section 
because document suddenly switched from How-To mode to Reference-Lookup mode.

Do you have a simple tutorial/codelab that helps me to build hello-world 
(described above) from here.

If not, I will dig into the code...

Thanks,
Anirban

On Fri, Jun 5, 2015 at 5:55 PM, Gour Saha 
mailto:gs...@hortonworks.com><mailto:gs...@hortonworks.com>>
 wrote:
Anirban,
Feel free to reach out directly to me or the DL 
dev@slider.incubator.apache.org<mailto:dev@slider.incubator.apache.org><mailto:dev@slider.incubator.apache.org>.
 The DL is preferable as there will be a larger team behind your questions.

-Gour

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1298 matches

Mail list logo