from:"Jon Maron"

Re: [DISCUSS] Next Steps for Slider & First-Class Services in YARN (YARN-4692)

2016-03-31 Thread Jon Maron

The proposal makes sense to me.  Given the relative maturity of the code and 
the overlap in functionality and requirements, it would make sense to proceed 
with Slider as the basis for the long running service support.  

> On Mar 16, 2016, at 10:58 PM, Josh Elser  wrote:
> 
> Thanks for the proposal, Gour! Interesting thought.
> 
> I think it makes sense. As YARN is maturing, long-lived services becoming a 
> primitive is a natural progression. Slider is likely at the forefront of 
> building such a primitive on YARN (from a lot of great planning/design from 
> Steve).
> 
> I think this would definitely be an interesting conversation to be had with 
> YARN (if the other podling members are of the same mindset). I think how this 
> plays out would require a bit of planning/coordination from the Hadoop PMC 
> side.
> 
> Now, there is the other half of Slider: the app-packages. My gut reaction is 
> that YARN would have no interest in owning/maintaining these. This is a bit 
> concerning to me because Slider on its own really isn't that exciting. It's 
> the app-packages that make it so enticing -- build a zip, install it to your 
> cluster, and suddenly users can start dynamically creating clusters (HBase, 
> Accumulo, Storm, etc). I would be strongly opposed to any plan to merge 
> Slider into YARN/Hadoop without a clear path forward on where the 
> app-packages would live. This is extremely important to me.
> 
> I'd love to see where this conversation can go.
> 
> - Josh
> 
> Gour Saha wrote:
>> Slider community,
>> 
>> 
>> The YARN team is discussing in 
>> YARN-4692  on how to add 
>> "first class services" directly to YARN. Some of the names in the discussion 
>> document should be familiar: that's because Slider is essentially the 
>> original long-lived application in YARN.
>> 
>> 
>> With YARN-4692, it is 
>> apparent that the Apache Hadoop YARN community is working towards providing 
>> direct support for long-lived services. I think we need to look at that 
>> proposal and think "where and how does Slider relate to this".
>> 
>> 
>> Apache Slider (incubating) has been in the business of creating and managing 
>> long-running services in YARN for a couple of years. Today it is being used 
>> in production YARN clusters across several companies (big and small). 
>> Several production-grade applications (data and non-data) are available as 
>> sample packages. A good number of them have been contributed by interested 
>> parties like Lucidworks contributing a Solr Slider Application Package and 
>> DataTorrent contributing a Kafka Slider Application Package.
>> 
>> 
>> Slider has been pretty good at taking existing applications and turning them 
>> into long-lived services in YARN. YARN offers the core scheduling, execution 
>> and failure reporting functions; slider takes that and adds: advanced 
>> container placement (history; anti-affine, escalation policies), 
>> configuration, dynamic binding, monitoring, failure handling, and an API for 
>> clients. It's also driven a lot of the 
>> YARN-896  "long-lived 
>> services" development: long-lived failure resilience, the YARN registry, 
>> container-preservation over YARN restarts. Big chunks of that code actually 
>> came from the Slider team. This was always a goal of the work even in its 
>> Hoya predecessor: show that YARN can be used to host applications like 
>> HBase, and identify where it can be be improved.
>> 
>> 
>> What does it mean for Slider if YARN starts doing this directly?
>> 
>> 
>> Slider provides a lot of the basic functionalities for long-running services 
>> proposed in  YARN-4692. It is a universal YARN app-master and lets 
>> application-owners focus on their application functionalities, while it 
>> handles the internals of orchestrating services on YARN.
>> 
>> 
>> Which means: we have an opportunity here to contribute the core of slider 
>> into YARN itself, and, with it in YARN, use it as the basis for the full 
>> TODO-list of YARN-4692.
>> 
>> 
>> The YARN team gets the stable codebase that's evolved over the past few 
>> years: something to deploy applications in a YARN cluster. What does Slider 
>> get? We'd get to be the foundation for long lived YARN services with the new 
>> work on top.
>> 
>> 
>> Would this work? What's wrong with the idea? How do we do it if we want to 
>> go with it?
>> 
>> 
>> I would like to call upon the community to weigh in their thoughts and 
>> opinions on this topic.
>> 
>> -Gour
>> 
>> 
>

Re: Long running applications in a non-Kerberos cluster?

2016-03-10 Thread Jon Maron


On Mar 9, 2016, at 11:19 PM, Josh Elser 
<josh.el...@gmail.com<mailto:josh.el...@gmail.com>> wrote:

Aww, thanks for the kind words to close it out :). Glad to hear it's worked 
well for you.

Just to share some Kerberos knowledge (and dispel any misinformation), any 
application which stops working after the default ticket lifetime is "doing it 
wrong" (tm). Like Steve pointed out, this is why renewals exist (often a 
dedicated thread running inside the application). Once you have a ticket, as 
long as you ask the KDC for a renewal before that original ticket expires, you 
can get a new one.

Does renewal work past the expiry period?  Generally, the default renewal 
period is defined as 24 hours, the default expiry period as days.  My 
impression was that you could continually renew every renewal period during the 
expiry period, but ultimately would be required to obtain a new token at the 7 
day mark:

public static final String  DFS_NAMENODE_DELEGATION_TOKEN_RENEW_INTERVAL_KEY = 
"dfs.namenode.delegation.token.renew-interval";
public static final long
DFS_NAMENODE_DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT = 24*60*60*1000;  // 1 day
public static final String  DFS_NAMENODE_DELEGATION_TOKEN_MAX_LIFETIME_KEY = 
"dfs.namenode.delegation.token.max-lifetime";
public static final longDFS_NAMENODE_DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT 
= 7*24*60*60*1000; // 7 days


Tim I wrote:
Great!  Thank you all.

Regarding 0.50, it's been a while, but I recall back porting a couple of
fixes to get my cluster up.  IIRC, that was primarily related to kerberos
fixes for instantiating the cluster.

In 0.50 (+ the patches) accumulo and storm required a restart after 7
days.  Barring that inconvenience, it worked great in a moderately sized
production environment.

Thanks again everyone.

Tim
On Mar 9, 2016 8:59 AM, "Jon 
Maron"<jma...@hortonworks.com<mailto:jma...@hortonworks.com>>  wrote:

On Mar 8, 2016, at 11:36 PM, Tim 
I<t...@timisrael.com<mailto:t...@timisrael.com>>  wrote:

Hi Josh,
Basically anything with a kerberos ticket could no longer could
communicate
with anything else after 7 days due to the default config for the
kerberos
server :
renew_lifetime = 7d

The delegation token I believe was the reason for this since it didn't
have
access to the original service keytab.

However, what I want to verify is if delegation tokens play any role in
non-kerberized clusters and if there is anything else that might inhibit
long running services in that environment.

My suspicion is probably not.  I'll continue testing.  If anyone knows
definitively, I'd love to hear about it.
You are correct - delegation tokens should not play a role in a non-secure
environment.

What was the specific nature of the issues?  i.e. which component ended up
seeing the expiration issue?  there has been work to resolve that issue
since 0.50.

Thanks!

Tim
On Mar 8, 2016 11:13 PM, "Josh 
Elser"<josh.el...@gmail.com<mailto:josh.el...@gmail.com>>  wrote:

Hi Tim,

I wish I definitively knew the current state of things, but I don't
anymore. I agree with your assessment though -- there should be no hard
limit (the current docs state this as requirements too,
http://slider.incubator.apache.org/docs/security.html).

Was the 7-day expiration you referred to in the Slider app-master? Or
the
application Slider was running (e.g. HBase)?

Tim I wrote:

Hi all,

My previous experience with Slider is in a Kerberized cluster using
Slider
0.50..  It required me to restart the apps every 7 days (due to ticket
expiration).

Based on what I've read, I don't think there is anything preventing a
non-Kerberized cluster from running apps indefinitely.  Is that
correct,
or
am I missing something?

I'm currently using Hadoop 2.6.0 if it matters.

Thanks,

Tim

Re: Long running applications in a non-Kerberos cluster?

2016-03-09 Thread Jon Maron


> On Mar 8, 2016, at 11:36 PM, Tim I  wrote:
> 
> Hi Josh,
> Basically anything with a kerberos ticket could no longer could communicate
> with anything else after 7 days due to the default config for the kerberos
> server :
> renew_lifetime = 7d
> 
> The delegation token I believe was the reason for this since it didn't have
> access to the original service keytab.
> 
> However, what I want to verify is if delegation tokens play any role in
> non-kerberized clusters and if there is anything else that might inhibit
> long running services in that environment.
> 
> My suspicion is probably not.  I'll continue testing.  If anyone knows
> definitively, I'd love to hear about it.

You are correct - delegation tokens should not play a role in a non-secure 
environment.

What was the specific nature of the issues?  i.e. which component ended up 
seeing the expiration issue?  there has been work to resolve that issue since 
0.50.

> 
> Thanks!
> 
> Tim
> On Mar 8, 2016 11:13 PM, "Josh Elser"  wrote:
> 
>> Hi Tim,
>> 
>> I wish I definitively knew the current state of things, but I don't
>> anymore. I agree with your assessment though -- there should be no hard
>> limit (the current docs state this as requirements too,
>> http://slider.incubator.apache.org/docs/security.html).
>> 
>> Was the 7-day expiration you referred to in the Slider app-master? Or the
>> application Slider was running (e.g. HBase)?
>> 
>> Tim I wrote:
>> 
>>> Hi all,
>>> 
>>> My previous experience with Slider is in a Kerberized cluster using Slider
>>> 0.50..  It required me to restart the apps every 7 days (due to ticket
>>> expiration).
>>> 
>>> Based on what I've read, I don't think there is anything preventing a
>>> non-Kerberized cluster from running apps indefinitely.  Is that correct,
>>> or
>>> am I missing something?
>>> 
>>> I'm currently using Hadoop 2.6.0 if it matters.
>>> 
>>> Thanks,
>>> 
>>> Tim
>>> 
>>>

Re: Global Allocated Ports Pool

2016-03-01 Thread Jon Maron


On Mar 1, 2016, at 9:25 AM, Roshan Punnoose 
<rosh...@gmail.com<mailto:rosh...@gmail.com>> wrote:

Would I be able to specify a different WAR for each instance of the slider
package, and change the application name?

that would generally require either a different package or utilizing one of the 
command line launch options:  
http://slider.apache.org/docs/slider_specs/simple_pkg.html

The name is specified during creation, so it could be changed.


Ok the port pool is interesting. So if I specify a port range here, can I
use the same port pool in a jetty configuration such as
${JETTY_PORT_POOL}{PER_CONTAINER} so that the different instances of the
slider application would pull from the same pool?

The port “pool” specifies a range of ports to select per host.  When the agent 
starts in a given container on a node, the agent will attempt to allocate a 
port in the specified range.  The setting would more than likely have the form 
of:

"site.server.port": “${JETTY_SERVER.ALLOCATED_PORT}{PER_CONTAINER}"

where JETTY_SERVER is the defined component name for the server.


On Tue, Mar 1, 2016 at 9:18 AM Jon Maron 
<jma...@hortonworks.com<mailto:jma...@hortonworks.com>> wrote:


On Mar 1, 2016, at 9:07 AM, Roshan Punnoose 
<rosh...@gmail.com<mailto:rosh...@gmail.com>> wrote:

Hi,

On my current application, we are looking to deploy jetty apps in Slider,
to allow the ability to provision them in Yarn and also to flex them as
needed. What I would like to do is have a separate slider application per
Jetty Base/WAR that I am deploying, and have Slider negotiate the ports
based on a global port pool. Is this possible?

For the first requirement, would it be sufficient to define a jetty app
package and simply deploy multiple instances?  You’d be leveraging the same
jetty instance, but starting separate instances in the cluster (the app
name would be changed, and you could do some modification of the app
configuration and resources)

For the second - the port pool is a set of ports allocated across the
cluster, or a defined set of ports allowed on a per host basis?  for the
latter there is an existing configuration:
http://slider.apache.org/docs/configuration/core.html#controlling-assigned-port-ranges

Basically, I just want to
make sure that multiple slider applications can use the same port pool
without stepping on each other.

Roshan

Re: Global Allocated Ports Pool

2016-03-01 Thread Jon Maron

> On Mar 1, 2016, at 9:07 AM, Roshan Punnoose  wrote:
> 
> Hi,
> 
> On my current application, we are looking to deploy jetty apps in Slider,
> to allow the ability to provision them in Yarn and also to flex them as
> needed. What I would like to do is have a separate slider application per
> Jetty Base/WAR that I am deploying, and have Slider negotiate the ports
> based on a global port pool. Is this possible?

For the first requirement, would it be sufficient to define a jetty app package 
and simply deploy multiple instances?  You’d be leveraging the same jetty 
instance, but starting separate instances in the cluster (the app name would be 
changed, and you could do some modification of the app configuration and 
resources)

For the second - the port pool is a set of ports allocated across the cluster, 
or a defined set of ports allowed on a per host basis?  for the latter there is 
an existing configuration:  
http://slider.apache.org/docs/configuration/core.html#controlling-assigned-port-ranges

> Basically, I just want to
> make sure that multiple slider applications can use the same port pool
> without stepping on each other.
> 
> Roshan

Re: slider 0.80 error on CDH 5.5.1 - "User is not based on a keytab in a secure deployment"

2016-02-09 Thread Jon Maron

The slider fix was simply a logging improvement, I believe.  The real issue is 
the modification of the way the JDK changed the kerberos logging module in 
latter versions of JDK 7 and JDK 8 
(https://issues.apache.org/jira/browse/HADOOP-10786).  I assume that your 
version of CDH is manifesting this latter issue.  Can you verify whether it has 
the fix for the given hadoop JIRA?

— Jon

On Feb 9, 2016, at 12:22 PM, Manoj Samel 
> wrote:

Any help on this will be appreciated !!!

Thanks,

Manoj

On Mon, Feb 8, 2016 at 5:54 PM, Manoj Samel 
>
wrote:

Hi,

Slider 0.80 running on CDH 5.5.1 secured cluster fails to create slider-AM
with error "User is not based on a keytab in a secure deployment".

1) I had same slider version working without issue on CDH 5.4.2 without
any issues.
2) I thought the issue was in slider 0.81 (
https://issues.apache.org/jira/browse/SLIDER-1010) - is this affecting
slider 0.80 as well ?

Thanks for any feedback 

Following is full trace

2016-02-09 00:45:25,932 [main] INFO  appmaster.SliderAppMaster - Slider AM
Security Mode: KEYTAB
2016-02-09 00:45:25,932 [main] INFO  appmaster.SliderAppMaster - Token
YARN_AM_RM_TOKEN
2016-02-09 00:45:25,932 [main] INFO  appmaster.SliderAppMaster - Token
HDFS_DELEGATION_TOKEN
2016-02-09 00:45:25,932 [main] INFO  appmaster.SliderAppMaster - Token
kms-dt
2016-02-09 00:45:25,951 [main] INFO  security.SecurityConfiguration -
Leveraging host keytab file /etc/hadoop/conf/XXX.keytab for login
2016-02-09 00:45:25,968 [main] INFO  security.UserGroupInformation - Login
successful for user XXX using keytab file /etc/hadoop/conf/XXX.keytab
2016-02-09 00:45:25,968 [main] ERROR main.ServiceLauncher - User is not
based on a keytab in a secure deployment.
2016-02-09 00:45:25,969 [main] INFO  util.ExitUtil - Exiting with status 70
2016-02-09 00:45:25,974 [Thread-1] INFO  ipc.Server - Stopping server on
1024
2016-02-09 00:45:25,975 [IPC Server listener on 1024] INFO  ipc.Server -
Stopping IPC Server listener on 1024
2016-02-09 00:45:25,975 [IPC Server Responder] INFO  ipc.Server - Stopping
IPC Server Responder
2016-02-09 00:45:25,975 [Thread-1] INFO  appmaster.SliderAppMaster -
Process has exited with exit code 0 mapped to 0 -ignoring
2016-02-09 00:45:25,975 [AMRM Callback Handler Thread] INFO
impl.AMRMClientAsyncImpl - Interrupted while waiting for queue
java.lang.InterruptedException
   at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
   at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
   at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
   at
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274)

Re: Valid characters in component name

2016-01-21 Thread Jon Maron

yes - please file a JIRA.  I was hoping to get an opportunity but my other work 
responsibilities have not allowed me the time.


> On Jan 21, 2016, at 12:15 PM, Manoj Samel <manojsamelt...@gmail.com> wrote:
> 
> Hello,
> 
> Any update on this ? Should I file a Jira ?
> 
> Thanks,
> 
> On Tue, Jan 19, 2016 at 8:58 AM, Manoj Samel <manojsamelt...@gmail.com>
> wrote:
> 
>> Hi Jon,
>> 
>> Did you get a chance to try to reproduce the issue with component
>> containing a dash ?
>> 
>> Thanks,
>> 
>> Manoj
>> 
>> On Thu, Jan 14, 2016 at 10:42 AM, Gour Saha <gs...@hortonworks.com> wrote:
>> 
>>> Ok great.
>>> 
>>> Steve,
>>> This is interesting. I don’t see anywhere we validate the component name
>>> against a dash. Am I missing something in the code :(
>>> 
>>> -Gour
>>> 
>>> On 1/14/16, 10:33 AM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>>> 
>>>> @Gour - yes, it does work when component name does not have a dash. It
>>>> also
>>>> works when component name has underscore(s)
>>>> 
>>>> Thanks,
>>>> 
>>>> Manoj
>>>> 
>>>> On Thu, Jan 14, 2016 at 10:25 AM, Gour Saha <gs...@hortonworks.com>
>>> wrote:
>>>> 
>>>>> Ah, my bad. Nevertheless can I assume the app works fine when you do
>>> not
>>>>> include dash in component name?
>>>>> 
>>>>> -Gour
>>>>> 
>>>>> On 1/14/16, 9:42 AM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>>>>> 
>>>>>> Hi Gour,
>>>>>> 
>>>>>> 1. The issue is with component name. The code you pointed is for
>>>>> cluster
>>>>>> name
>>>>>> 2. As mentioned above, the component name does not seem to allow dash
>>>>> "-",
>>>>>> it does allow underscore.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Manoj
>>>>>> 
>>>>>> On Thu, Jan 14, 2016 at 9:24 AM, Gour Saha <gs...@hortonworks.com>
>>>>> wrote:
>>>>>> 
>>>>>>> The current acceptable name pattern is [a-z][a-z0-9_-]*
>>>>>>> (SliderUtils.java
>>>>>>> - clusternamePattern). It includes the dash (-).
>>>>>>> 
>>>>>>> While Jon experiments with your package, just wondering your app
>>>>> works
>>>>>>> fine when you get rid of the dash, right?
>>>>>>> 
>>>>>>> -Gour
>>>>>>> 
>>>>>>> On 1/14/16, 8:35 AM, "Jon Maron" <jma...@hortonworks.com> wrote:
>>>>>>> 
>>>>>>>> Sorry - got pulled off on some other tasks.  I¹ll try to take a
>>> look
>>>>>>> over
>>>>>>>> the next day or so.
>>>>>>>> 
>>>>>>>>> On Jan 14, 2016, at 11:31 AM, Manoj Samel
>>>>> <manojsamelt...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Any update on this ? Did you get a chance to try the memcache
>>> with
>>>>>>> the
>>>>>>>>> component name I used above ?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Manoj
>>>>>>>>> 
>>>>>>>>> On Tue, Jan 12, 2016 at 4:37 PM, Steve Loughran
>>>>>>> <ste...@hortonworks.com
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 12 Jan 2016, at 12:14, Jon Maron <jma...@hortonworks.com>
>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> OK.  So was there a Œ-Œ or some other character in your
>>>>> component
>>>>>>>>>>> name?
>>>>>>>>>> A Œ-Œ should work.  The component names are currently expected
>>> to
>>>>>>>>>> follow a
>>>>>>>>>> naming convention that allows for DNS compatible names, and
>>>>> dashes
>>>>>>> are
>>>>>>>>>> included in that character set.  The fact that the endpoint did
>>>>> not
>>>>>>>>>> appear
>>>>>>>>>> may be related to some other issue.  The AM logs may help here
>>> as
>>>>>>> well.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> We should be checking component names early on, because it would
>>>>>>> really
>>>>>>>>>> break bits of the REST API too.
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>

Re: Valid characters in component name

2016-01-14 Thread Jon Maron

We may want to eliminate underscores given their issues with DNS etc.


> On Jan 14, 2016, at 12:26 PM, Gour Saha <gs...@hortonworks.com> wrote:
> 
> The current acceptable name pattern is [a-z][a-z0-9_-]*  (SliderUtils.java
> - clusternamePattern). It includes the dash (-).
> 
> While Jon experiments with your package, just wondering your app works
> fine when you get rid of the dash, right?
> 
> -Gour
> 
>> On 1/14/16, 8:35 AM, "Jon Maron" <jma...@hortonworks.com> wrote:
>> 
>> Sorry - got pulled off on some other tasks.  I¹ll try to take a look over
>> the next day or so.
>> 
>>> On Jan 14, 2016, at 11:31 AM, Manoj Samel <manojsamelt...@gmail.com>
>>> wrote:
>>> 
>>> Hi,
>>> 
>>> Any update on this ? Did you get a chance to try the memcache with the
>>> component name I used above ?
>>> 
>>> Thanks,
>>> 
>>> Manoj
>>> 
>>> On Tue, Jan 12, 2016 at 4:37 PM, Steve Loughran <ste...@hortonworks.com>
>>> wrote:
>>> 
>>>> 
>>>>> On 12 Jan 2016, at 12:14, Jon Maron <jma...@hortonworks.com> wrote:
>>>>> 
>>>>> OK.  So was there a Œ-Œ or some other character in your component
>>>>> name?
>>>> A Œ-Œ should work.  The component names are currently expected to
>>>> follow a
>>>> naming convention that allows for DNS compatible names, and dashes are
>>>> included in that character set.  The fact that the endpoint did not
>>>> appear
>>>> may be related to some other issue.  The AM logs may help here as well.
>>>> 
>>>> 
>>>> We should be checking component names early on, because it would really
>>>> break bits of the REST API too.
> 
>

Re: Valid characters in component name

2016-01-14 Thread Jon Maron

Sorry - got pulled off on some other tasks.  I’ll try to take a look over the 
next day or so.

> On Jan 14, 2016, at 11:31 AM, Manoj Samel <manojsamelt...@gmail.com> wrote:
> 
> Hi,
> 
> Any update on this ? Did you get a chance to try the memcache with the
> component name I used above ?
> 
> Thanks,
> 
> Manoj
> 
> On Tue, Jan 12, 2016 at 4:37 PM, Steve Loughran <ste...@hortonworks.com>
> wrote:
> 
>> 
>>> On 12 Jan 2016, at 12:14, Jon Maron <jma...@hortonworks.com> wrote:
>>> 
>>> OK.  So was there a ‘-‘ or some other character in your component name?
>> A ‘-‘ should work.  The component names are currently expected to follow a
>> naming convention that allows for DNS compatible names, and dashes are
>> included in that character set.  The fact that the endpoint did not appear
>> may be related to some other issue.  The AM logs may help here as well.
>> 
>> 
>> We should be checking component names early on, because it would really
>> break bits of the REST API too.

Re: Valid characters in component name

2016-01-12 Thread Jon Maron

I don’t think attachments make their way thru the list server.  You can try 
sending it directly or I may be able to reproduce locally.

On Jan 12, 2016, at 4:27 PM, Manoj Samel 
<manojsamelt...@gmail.com<mailto:manojsamelt...@gmail.com>> wrote:

Hi Jon,

I tried the OOB jmemcached with component name my-name. You may be able to 
reproduce by trying the jmemcached with component name my-name

Attached are the slider-am log (part) and the component agent log. The 
component agent shows "Unable to connect" error as before. I have masked FQDNs 
etc. but rest of log is intact.

As far as I can see, AM seems not to get container heartbeat (since container 
start has error), marks  container lost (-100) and keeps attempting to get new 
container.

2016-01-12 20:53:46,334 [Thread-33] WARN  agent.HeartbeatMonitor - Component 
ComponentInstanceState{containerIdAsString='container_1452195922769_0005_01_02',
 state=INIT, failuresSeen=0, lastHeartbeat=1452631909186, 
containerState=UNHEALTHY, componentName='my-name'} marked UNHEALTHY. Last 
heartbeat received at 1452631909186 approx. 117148 ms. back.
2016-01-12 20:54:46,335 [Thread-33] WARN  agent.HeartbeatMonitor - Component 
ComponentInstanceState{containerIdAsString='container_1452195922769_0005_01_02',
 state=INIT, failuresSeen=0, lastHeartbeat=1452631909186, 
containerState=HEARTBEAT_LOST, componentName='my-name'} marked HEARTBEAT_LOST. 
Last heartbeat received at 1452631909186 approx. 177149 ms. back.
2016-01-12 20:54:46,335 [AmExecutor-006] INFO  appmaster.SliderAppMaster - 
containerLostContactWithProvider: container 
container_1452195922769_0005_01_02 lost
2016-01-12 20:54:46,336 [AmExecutor-006] INFO  appmaster.SliderAppMaster - 
Container released; triggering review
2016-01-12 20:54:46,336 [AmExecutor-006] INFO  state.AppState - Reviewing 
RoleStatus{name='my-name', key=1, desired=1, actual=1, requested=0, 
releasing=1, failed=0, failed recently=0, node failed=0, pre-empted=0, 
started=1, startFailed=0, completed=0, failureMessage=''} : expected 1
2016-01-12 20:54:47,340 [AMRM Callback Handler Thread] INFO  
appmaster.SliderAppMaster - onContainersCompleted([1]
2016-01-12 20:54:47,340 [AMRM Callback Handler Thread] INFO  
appmaster.SliderAppMaster - Container Completion for 
containerID=container_1452195922769_0005_01_02, state=COMPLETE, 
exitStatus=-100, diagnostics=Container released by application

Thanks for your time,

Manoj

On Tue, Jan 12, 2016 at 12:14 PM, Jon Maron 
<jma...@hortonworks.com<mailto:jma...@hortonworks.com>> wrote:
OK.  So was there a ‘-‘ or some other character in your component name?  A ‘-‘ 
should work.  The component names are currently expected to follow a naming 
convention that allows for DNS compatible names, and dashes are included in 
that character set.  The fact that the endpoint did not appear may be related 
to some other issue.  The AM logs may help here as well.

> On Jan 12, 2016, at 2:57 PM, Manoj Samel 
> <manojsamelt...@gmail.com<mailto:manojsamelt...@gmail.com>> wrote:
>
> Jon,
>
> I replaced  with actual component name.
>
> Thanks,
>
>
>
> On Tue, Jan 12, 2016 at 11:43 AM, Jon Maron 
> <jma...@hortonworks.com<mailto:jma...@hortonworks.com>> wrote:
>
>> Did you replace the actual comp name with , or do you actually
>> have the ‘<‘ and ‘>’ characters in the name?
>>
>>> On Jan 12, 2016, at 2:40 PM, Manoj Samel 
>>> <manojsamelt...@gmail.com<mailto:manojsamelt...@gmail.com>>
>> wrote:
>>>
>>> Slider version 0.80 with secured cluster
>>>
>>> Use case is to create a component reflecting user name. It seems the only
>>> valid character in component name besides [A-Z][a-z[0-9] is underscore
>> '_'.
>>>
>>> Attempt to create a component with characters like dash '-' or many other
>>> characters fail to bring up the component with error like below where
>>> 
>>> is the component name containing offending character
>>>
>>> INFO 2015-12-24 18:55:40,605 Controller.py:140 - Registering with the
>>> server at
>>>
>> https://host1:41613/ws/v1/slider/agents/container_1450746204314_0043_01_02___
>> /register
>>> with data '{"tags": "", "timestamp": 1450983340604, "expectedState": 0,
>>> "responseId": -1, "actualState": 0, "logFolders": {}, "agentVersion":
>> "1",
>>> "allocatedPorts": {}, "appVersion": null, "publicHostname": "host2",
>>> "label": "container_1450746204314_0043_01_02___"}'
>>> INFO 2015-12-24 18:55:40,605 security.py:89 - SSL Connect being called..
>>> connecting to the server
>>

Re: Valid characters in component name

2016-01-12 Thread Jon Maron

Did you replace the actual comp name with , or do you actually have 
the ‘<‘ and ‘>’ characters in the name?

> On Jan 12, 2016, at 2:40 PM, Manoj Samel  wrote:
> 
> Slider version 0.80 with secured cluster
> 
> Use case is to create a component reflecting user name. It seems the only
> valid character in component name besides [A-Z][a-z[0-9] is underscore '_'.
> 
> Attempt to create a component with characters like dash '-' or many other
> characters fail to bring up the component with error like below where
> 
> is the component name containing offending character
> 
> INFO 2015-12-24 18:55:40,605 Controller.py:140 - Registering with the
> server at
> https://host1:41613/ws/v1/slider/agents/container_1450746204314_0043_01_02___/register
> with data '{"tags": "", "timestamp": 1450983340604, "expectedState": 0,
> "responseId": -1, "actualState": 0, "logFolders": {}, "agentVersion": "1",
> "allocatedPorts": {}, "appVersion": null, "publicHostname": "host2",
> "label": "container_1450746204314_0043_01_02___"}'
> INFO 2015-12-24 18:55:40,605 security.py:89 - SSL Connect being called..
> connecting to the server
> INFO 2015-12-24 18:55:40,695 security.py:51 - SSL connection established.
> Two-way SSL authentication is turned off on the server.
> INFO 2015-12-24 18:55:40,745 Controller.py:183 - Unable to connect to:
> https://host1:41613/ws/v1/slider/agents/container_1450746204314_0043_01_02___/register
> 
> Traceback (most recent call last):
>  File
> "/data/yarn/local/usercache/foo/appcache/application_1450746204314_0043/filecache/10/slider-agent.tar.gz/slider-agent/agent/Controller.py",
> line 142, in registerWithServer
>regResp = json.loads(response)
>  File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
>return _default_decoder.decode(s)
>  File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
>obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>  File "/usr/lib64/python2.6/json/decoder.py", line 338, in raw_decode
>raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> 
> 
> Any thoughts ?
> 
> Thanks,

Re: [VOTE] Apache Slider (incubating) release 0.90.2-incubating-RC1

2015-12-30 Thread Jon Maron

+1

Downloaded source tar.gz file
Found no .pyc files
verified sha1, asc, and md5
Ran full unit test suite successfully

> On Dec 29, 2015, at 5:48 AM, Steve Loughran  wrote:
> 
> Hello,
> 
> This is a call for a vote on the Apache Slider (incubating) release 
> 0.90.2-incubating-RC1. It's the exact same codebase as in the previous RC, 
> except the package/release process now makes sure that **/*.pyc is deleted 
> during cleanup. Josh has a better fix for the issue (SLIDER-1039); I'm not 
> applying it here to avoid the full retest sequence.
> 
> 
> Issues fixed:
> https://issues.apache.org/jira/browse/SLIDER/fixforversion/12332352/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
> 
> Source artifacts:
> https://dist.apache.org/repos/dist/release/incubator/slider/0.90.2-incubating-RC1
> 
> Staged artifacts:
> https://repository.apache.org/content/repositories/orgapacheslider-1013/
> 
> Git source:
> https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=9bb379f83c78b61aeb4110b93796bc02c08c4226
> SHA1: 9bb379f83c78b61aeb4110b93796bc02c08c4226
> 
> PGP key:
> http://pgp.mit.edu:11371/pks/lookup?op=vindex=ste...@apache.org
> 
> 
> [ ] +1 Release Apache Slider (incubating) 0.90.2-incubating-RC1
> [ ] 0
> [ ] -1 Do not release Apache Slider (incubating) 0.90.2-incubating-RC1
> 
> Voting lasts 72h —though as at lot of people are on vacation, and as I'm 
> heading offline in 12 minutes & staying that way until Jan 4, I'm going to 
> allow for some late votes from others who are not at their keyboards so much 
> this week
> 
> 
> Please review and vote
> 
> -Steve

Re: [VOTE] Apache Slider (incubating) release 0.90.2-incubating-RC1

2015-12-29 Thread Jon Maron

Does looks like this issue:  https://bugs.openjdk.java.net/browse/JDK-8051012.  
I can find instances on the web referencing the version you use with this issue 
(groovy compatibility issue)

> On Dec 29, 2015, at 12:31 PM, Ted Yu  wrote:
> 
> java version "1.7.0_67"
> Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
> 
> On Tue, Dec 29, 2015 at 9:29 AM, Josh Elser  wrote:
> 
>> I didn't see this myself, but I also didn't do the cluster setup (so
>> probably lots of my tests just no-op'ed themselves).
>> 
>> Out of curiosity, were you running OpenJDK or Oracle?
>> 
>> Ted Yu wrote:
>> 
>>> I ran test suite (against hadoop 2.7.0) on Linux and got:
>>> 
>>> Tests in error:
>>>   TestAgentClientProvider2.initializationError » Verify Bad  method
>>> call f...
>>>   TestSliderClientMethods.initializationError » Verify Bad  method
>>> call fr...
>>> 
>>> 
>>> initializationError(org.apache.slider.providers.agent.TestAgentClientProvider2)
>>>  Time elapsed: 0.007 sec<<<  ERROR!
>>> java.lang.VerifyError: Bad  method call from inside of a branch
>>> Exception Details:
>>>   Location:
>>> org/apache/slider/providers/agent/TestAgentClientProvider2.()V
>>> @34: invokespecial
>>>   Reason:
>>> Error exists in the bytecode
>>>   Bytecode:
>>> 000: 2a4c 1300 1db8 0023 03bd 0018 1300 24b8
>>> 
>>> java version "1.7.0_67"
>>> 
>>> 
>>> Has anyone seen similar error ?
>>> 
>>> Cheers
>>> 
>>> On Tue, Dec 29, 2015 at 2:48 AM, Steve Loughran
>>> wrote:
>>> 
>>> Hello,
 
 This is a call for a vote on the Apache Slider (incubating) release
 0.90.2-incubating-RC1. It's the exact same codebase as in the previous
 RC,
 except the package/release process now makes sure that **/*.pyc is
 deleted
 during cleanup. Josh has a better fix for the issue (SLIDER-1039); I'm
 not
 applying it here to avoid the full retest sequence.
 
 
 Issues fixed:
 
 
 https://issues.apache.org/jira/browse/SLIDER/fixforversion/12332352/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
 
 Source artifacts:
 
 
 https://dist.apache.org/repos/dist/release/incubator/slider/0.90.2-incubating-RC1
 
 Staged artifacts:
 https://repository.apache.org/content/repositories/orgapacheslider-1013/
 
 Git source:
 
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=9bb379f83c78b61aeb4110b93796bc02c08c4226
 SHA1: 9bb379f83c78b61aeb4110b93796bc02c08c4226
 
 PGP key:
 http://pgp.mit.edu:11371/pks/lookup?op=vindex=ste...@apache.org
 
 
 [ ] +1 Release Apache Slider (incubating) 0.90.2-incubating-RC1
 [ ] 0
 [ ] -1 Do not release Apache Slider (incubating) 0.90.2-incubating-RC1
 
 Voting lasts 72h —though as at lot of people are on vacation, and as I'm
 heading offline in 12 minutes&  staying that way until Jan 4, I'm going
 to
 allow for some late votes from others who are not at their keyboards so
 much this week
 
 
 Please review and vote
 
 -Steve
 
>>> 
>>>

Re: ZK registry entries cleanup for slider app

2015-12-24 Thread Jon Maron


> On Dec 23, 2015, at 6:27 PM, Manoj Samel <manojsamelt...@gmail.com> wrote:
> 
> Jon, as to your last comment -
> 
> As I understand, the idea of pushing slider registry to more general yarn
> registry was to make it easier for clients of application to query, without
> going through slider (URIs) at all. Most client would care about info for
> live application. If client query on ZK registry can't tell whether the
> application is live, then its not clear what is the value of registry for
> clients of the application- there has to be some path in registry that is
> tied to liveliness of the application.

As Steve indicated, there is a patch that will be merged post the next release 
that does, in fact, purge the repository entries during a stop (SLIDER-1017).  
That should address the concern about a mismatch of availability of information 
between resources (ZK vs AM).  

I think the registry being moved to Yarn was more an effort to make its 
facilities available as a resource for all yarn applications rather than slider 
specifically.  But I would also agree that its utility as a “liveness” 
indicator is enhanced by this fix.

> 
> If slider wants to keep the configurations of stopped applications around,
> would some other ZK path or may be HDFS be a store for it ?
> 
> Thoughts?
> 
> Manoj
> 
> 
> 
> On Wed, Dec 23, 2015 at 12:36 PM, Steve Loughran <ste...@hortonworks.com>
> wrote:
> 
>> there's a patch up to do the deletion (best effort) when things get taken
>> down...I plan to pull it in once we've got this 0.90 release out
>> 
>>> On 23 Dec 2015, at 20:21, Jon Maron <jma...@hortonworks.com> wrote:
>>> 
>>>> 
>>>> On Dec 23, 2015, at 3:13 PM, Manoj Samel <manojsamelt...@gmail.com>
>> wrote:
>>>> 
>>>> Thanks Jon,
>>>> 
>>>> I thought the ZK registry was to reflect the live information about the
>>>> running application (i.e. I expected it to fail with no nodes found when
>>>> app is not running) but that does not seems to be the intent.
>>>> 
>>>> On the other hand,
>>>> http://
>> :1025/ws/v1/slider/publisher/slider/componentinstancedata
>>>> <
>> http://ip-10-222-0-38.us-west-2.compute.internal:1025/ws/v1/slider/publisher/slider/componentinstancedata
>>> 
>>>> gives
>>>> info only when app is running. Should the clients be quering this URL
>>>> rather than ZK nodes to if they expect to get info only when app is
>> running
>>>> ?
>>> 
>>> I would say that, in general, to get information about the live
>> instances it is best to leverage the URIs.  They generally provide access
>> to in-memory or registry based information.
>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> Manoj
>>>> 
>>>> On Wed, Dec 23, 2015 at 11:12 AM, Jon Maron <jma...@hortonworks.com>
>> wrote:
>>>> 
>>>>> 
>>>>>> On Dec 23, 2015, at 2:03 PM, Manoj Samel <manojsamelt...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Setup - slider version .80, secure hadoop cluster, registry enabled
>> with
>>>>>> security
>>>>>> 
>>>>>> Noticed using *** zookeeper client (zkCli.sh) *** that when slider
>>>>>> application is stopped, the registry entries in
>>>>>> zookeeper
>> /registry/users/xxx/services/org-apache-slider/abc/components
>>>>> do
>>>>>> not get deleted. However, they do get updated when application is
>> started
>>>>>> again and new container IDs get allocated. In this case, the
>>>>>> components/containers_ do reflect new container IDs. Also noticed
>> that ZK
>>>>>> registry entries do get deleted if "destroy" applications is done.
>>>>>> 
>>>>>> Is this expected behavior ? Shouldn't the ZK entries be deleted when
>>>>>> application and its containers are not present after application is
>>>>> stopped
>>>>>> ?
>>>>> 
>>>>> Not exactly.  You could think of start/stop as thaw/freeze (their
>> actual
>>>>> previous incarnation).  The idea is that a “stop” terminates the
>>>>> application instance but maintains the configuration so that another
>>>>> instance can be started based on the same configuration.  You probably
>>>>> could also think of the ZK config as the “class definition” and the
>> actual
>>>>> launched instance as the “Object”.
>>>>> 
>>>>>> 
>>>>>> Thanks in advance,
>>>>>> 
>>>>>> Manoj
>> 
>>

Re: ZK registry entries cleanup for slider app

2015-12-23 Thread Jon Maron

> On Dec 23, 2015, at 2:03 PM, Manoj Samel  wrote:
> 
> Hi,
> 
> Setup - slider version .80, secure hadoop cluster, registry enabled with
> security
> 
> Noticed using *** zookeeper client (zkCli.sh) *** that when slider
> application is stopped, the registry entries in
> zookeeper /registry/users/xxx/services/org-apache-slider/abc/components do
> not get deleted. However, they do get updated when application is started
> again and new container IDs get allocated. In this case, the
> components/containers_ do reflect new container IDs. Also noticed that ZK
> registry entries do get deleted if "destroy" applications is done.
> 
> Is this expected behavior ? Shouldn't the ZK entries be deleted when
> application and its containers are not present after application is stopped
> ?

Not exactly.  You could think of start/stop as thaw/freeze (their actual 
previous incarnation).  The idea is that a “stop” terminates the application 
instance but maintains the configuration so that another instance can be 
started based on the same configuration.  You probably could also think of the 
ZK config as the “class definition” and the actual launched instance as the 
“Object”.

> 
> Thanks in advance,
> 
> Manoj

Re: ZK registry entries cleanup for slider app

2015-12-23 Thread Jon Maron


> On Dec 23, 2015, at 3:13 PM, Manoj Samel <manojsamelt...@gmail.com> wrote:
> 
> Thanks Jon,
> 
> I thought the ZK registry was to reflect the live information about the
> running application (i.e. I expected it to fail with no nodes found when
> app is not running) but that does not seems to be the intent.
> 
> On the other hand,
> http://:1025/ws/v1/slider/publisher/slider/componentinstancedata
> <http://ip-10-222-0-38.us-west-2.compute.internal:1025/ws/v1/slider/publisher/slider/componentinstancedata>
> gives
> info only when app is running. Should the clients be quering this URL
> rather than ZK nodes to if they expect to get info only when app is running
> ?

I would say that, in general, to get information about the live instances it is 
best to leverage the URIs.  They generally provide access to in-memory or 
registry based information.

> 
> Thanks,
> 
> Manoj
> 
> On Wed, Dec 23, 2015 at 11:12 AM, Jon Maron <jma...@hortonworks.com> wrote:
> 
>> 
>>> On Dec 23, 2015, at 2:03 PM, Manoj Samel <manojsamelt...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Setup - slider version .80, secure hadoop cluster, registry enabled with
>>> security
>>> 
>>> Noticed using *** zookeeper client (zkCli.sh) *** that when slider
>>> application is stopped, the registry entries in
>>> zookeeper /registry/users/xxx/services/org-apache-slider/abc/components
>> do
>>> not get deleted. However, they do get updated when application is started
>>> again and new container IDs get allocated. In this case, the
>>> components/containers_ do reflect new container IDs. Also noticed that ZK
>>> registry entries do get deleted if "destroy" applications is done.
>>> 
>>> Is this expected behavior ? Shouldn't the ZK entries be deleted when
>>> application and its containers are not present after application is
>> stopped
>>> ?
>> 
>> Not exactly.  You could think of start/stop as thaw/freeze (their actual
>> previous incarnation).  The idea is that a “stop” terminates the
>> application instance but maintains the configuration so that another
>> instance can be started based on the same configuration.  You probably
>> could also think of the ZK config as the “class definition” and the actual
>> launched instance as the “Object”.
>> 
>>> 
>>> Thanks in advance,
>>> 
>>> Manoj
>> 
>>

Re: DISCUSS: SLIDER-1020 Slider client to copy slider.* to AM config

2015-12-14 Thread Jon Maron


> On Dec 14, 2015, at 9:11 AM, Steve Loughran  wrote:
> 
> 
> I'm doing some work getting the AM to come up as HTTPS (primarily to for 
> tests related to SLIDER-907 and an HTTPS RM)

So you’re implementing HTTPS at the RM end?  I imagine you’ll need access to AM 
cert etc.  We may want to think more broadly about certificates in the cluster, 
given that python 2.7.9 is going to break the current slider and ambari agent 
implementations.

> 
> for this to be a lot easier to test, I'm thinking I'd like the slider client 
> to propagate all the slider.* configuration parameters
> 
> Or
> 
> we change the AM to read everything beginning with slider. from the 
> appconf/components/slider-appmaster component and set them in the 
> configuration. This is already done for some of the security stuff (keytab, 
> etc)
> 
> Option 1: lets you put things in slider-client.xml, so have some options 
> which are global to all instances
> 
> Option 2; lets you keep the configuration in your application config.
> 
> 
> What do people think is best? I think I might like to go with Option #2 for 
> now, as option #1 could be added on later

+1

Re: Concerning Sentry: A disagreement over the Apache Way and graduation

2015-11-10 Thread Jon Maron


> On Nov 10, 2015, at 3:38 PM, Steve Loughran  wrote:
> 
> From incubator-general
> 
> This is interesting —and I think we need to make sure we aren't going to go 
> the same way.
> 
> Part of the problem is, IMO, simply JIRA-first development gets in the way of 
> broader discussions. I see that across projects, including Hadoop, spark & 
> others. It's a great tool from a coding perspective, but I'm not convinced 
> its so good for setting a shared vision of where a project should be going.
> 
> One thing I think we could do, other than talk more across the list, is set 
> up some hangouts (or worse, webex) chats with people using/developing with 
> Slider. I'm in GMT+000 right now, so can talk mornings my time/evenings asia, 
> or evenings my time/mornings US, and my sunnyvale colleagues could round out 
> the cycle with a US/asia chat.
> 
> who would be interested in some video conferences next week? Set a date and 
> we can work out an agenda. I'll gladly talk about where I've been going with 
> anti-affinity  & even show some of the code

Sounds like a good idea to me - I’d be interested. 

> 
> -steve
> 
> 
> Begin forwarded message:
> 
> From: Joe Brockmeier >
> Date: 2 November 2015 at 11:59:15 GMT
> To: General Apache Incubator 
> >
> Subject: Concerning Sentry: A disagreement over the Apache Way and graduation
> 
> Hi all,
> 
> I'm one of the mentors of Sentry, which has been in incubation for some
> time. The project has progressed in a number of ways, but my largest
> concern is that the podling is doing [in my opinion] too much
> development and discussion out-of-sight.
> 
> I've raised issues about this, as has David Nalley. David had a
> conversation with members of Sentry at ApacheCon Big Data in September,
> and that discussion was brought back to the list. [1]
> 
> Jiras are being filed, and swiftly acted on, in a way that strongly
> suggests that a lot of discussion and direction of the project are
> happening off-list and out-of-sight to the average participant. David
> and myself have suggested ways that the community can remedy this, but
> the most recent mail from Arvind indicates that he (and others in the
> podling) don't feel it is a "valid ask."
> 
> At this point, I'm raising this to general@ because I'd like second (and
> third, etc.) opinions. Perhaps I'm deeply wrong, and others here feel
> Sentry is ready to graduate. My feeling is that the podling is not
> operating in "the Apache Way" and doesn't show a great deal of interest
> in doing so. [2] To quote Arvind:
> 
> "I feel another issue being pointed out or which has been eluded to in
> the past is - who decides which Jiras should be fixed, what features to
> create etc, specially when they show up as Jira issues directly with
> patches that follow soon. It seems that in some ways the lack of using
> mailing lists directly for discussion is linked to this behavior of
> filing issues and fixing them rapidly, as if following a roadmap that
> the community does not have control over. Please pardon me if my
> interpretation/understanding of the issue is not right. But if it is
> right, then I do want to say that - that too is not an issue in my
> opinion at all. And here is why:
> 
> When someone files a Jira, they are inviting the entire community to
> comment on it and provide feedback. If it is not in the interest of the
> project, I do believe that responsible members of the community will be
> quick to bring that out for discussion and even Veto it if necessary. If
> that is not happening, it is not an issue with lack of community
> participation, but rather it is an indicator of a project team that
> knows where the gaps are and understands how to go about filling them
> intuitively."
> 
> The model that Sentry is pursing may work very well *for the existing
> members of the podling.* In my opinion, its process is entirely too
> opaque to allow for interested parties outside of the existing podling
> and companies interested in Sentry development to become involved.
> 
> The podling is pressing to move to graduation, and I cannot in good
> conscience vote +1 or even +0 at this point. I'm strongly -1 as a mentor
> and don't feel the podling has any interest in working in "the Apache
> Way" as commonly understood. [3]
> 
> However, I feel we've reached an impasse and there's little value in
> continuing to debate amongst the mentors / podling. They've stated their
> position(s) and I've stated mine. (I *think* David Nalley is in
> agreement with me, but I don't wish to speak for him.)
> 
> I'm bringing this to the IPMC fully understanding that I might be
> totally wrong - maybe I'm holding to a too strict or outdated idea of
> how projects should operate. I'm happy to be told so if that's the case
> so I can improve as a mentor or decide to bow out from mentoring in the
> future, if it's the

Re: november report draft

2015-11-03 Thread Jon Maron

+1

> On Nov 3, 2015, at 12:59 PM, Billie Rinaldi  wrote:
> 
> Would anyone like to add something to the following report for November?
> 
> -
> Slider
> 
> Slider is a collection of tools and technologies to package, deploy, and
> manage long running applications on Apache Hadoop YARN clusters.
> 
> Slider has been incubating since 2014-04-29.
> 
> Three most important issues to address in the move towards graduation:
> 
>  1. Getting more external users
>  2. Getting more diverse set of developers
>  3. Getting more diverse set of committers/PMC
> 
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> aware of?
> 
>  No
> 
> How has the community developed since the last report?
> 
>  We are seeing traction in the community with a number of new users
>  requesting improvements in deployment and management of applications
>  using Slider.  Patches were submitted by a couple of new contributors.
>  We still have to get those people into long term coding, and then bring
>  them in to the committer group.
> 
> How has the project developed since the last report?
> 
>  We released a slider-0.81.1-incubating bug fix release with 30 JIRAs
>  resolved and have begun discussing the 0.90.0 release.
> 
> Date of last release:
> 
>  2015-10-29 slider-0.81.1-incubating
> 
> When were the last committers or PMC members elected?
> 
>  2015-07-07: Yu (Thomas) Liu

[ANNOUNCE] Apache Slider 0.81.1-incubating

2015-11-02 Thread Jon Maron


The Apache Slider team is proud to announce Apache Slider incubation release
version 0.81.1-incubating.

Apache Slider (incubating) is a YARN application which deploys existing
distributed applications on YARN,
monitors them, and makes them larger or smaller as desired - even while the
application is running.

The release artifacts are available at:
http://www.apache.org/dyn/closer.cgi/incubator/slider/0.81.1-incubating/

To use the artifacts, please use the following documentation:
http://slider.incubator.apache.org/docs/getting_started.html

Release notes available at:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315422
rsion=12332386

We would like to thank all the contributors that made the release possible.

Regards,
The Slider Team

-

DISCLAIMER

Apache Slider is an effort undergoing incubation at The Apache Software
Foundation (ASF),
sponsored by the Apache Incubator PMC. Incubation is required of all newly
accepted projects
until a further review indicates that the infrastructure, communications,
and decision making
process have stabilized in a manner consistent with other successful ASF
projects. While incubation
status is not necessarily a reflection of the completeness or stability of
the code, it does indicate
that the project has yet to be fully endorsed by the ASF.

Re: Slider-agent can not be started

2015-10-28 Thread Jon Maron

Any idea what python version the agent hosts are running?  Version 2.7.9 has 
issues, so the max version currently supported in 2.7.8.

— Jon

On Oct 28, 2015, at 6:27 AM, Rohith Sharma K S 
> wrote:

Hi folks,

Ping.. Any Idea on this why slider-agent cannot be connected to ZK?

I tried in develop branch, the same error is logging in slider agent!!

The ZK quorum in configured properly and ZK is running 10.18.130.110:54000 
address.  And I am using zookeeper-3.4.6.

Error log :

ERROR 2015-10-06 13:38:31,077 Registry.py:63 - Could not connect to zk registry 
at /registry/users/rohith/services/org-apache-slider/t6 in quorum 
10.18.130.110:54000. Error: 'NoneType' object has no attribute 'strip'
INFO 2015-10-06 13:38:31,078 Registry.py:69 - AM Host = , AM Secured Port = , 
ping port =
INFO 2015-10-06 13:38:31,078 main.py:259 - Unable to extract AM host details 
from ZK, retrying ...


Any help is appreciated☺

Thanks & Regards
Rohith Sharma K S

From: Rohith Sharma K S
Sent: 06 October 2015 14:39
To: 'dev@slider.incubator.apache.org'
Subject: Slider-agent can not be started

Hi

I am trying to deploy HBase using Slider. I created HBase package with 
Hadoop-2.6 distribution. I submitted job using “python slider create t6 
--template appConfig.json --resources resources.json”.
Slider master started running but Hmaster launching is failed. In the 
slider-agent.log which is attached has following error.

I hope I am missing some configurations, could anyone help to understand why 
there is below error? How to resolve this issue? Full slider-agent.log I have 
attached in the mail.

INFO 2015-10-06 13:38:21,067 main.py:259 - Unable to extract AM host details 
from ZK, retrying ...
ERROR 2015-10-06 13:38:31,077 Registry.py:63 - Could not connect to zk registry 
at /registry/users/rohith/services/org-apache-slider/t6 in quorum 
10.18.130.110:54000. Error: 'NoneType' object has no attribute 'strip'
INFO 2015-10-06 13:38:31,078 Registry.py:69 - AM Host = , AM Secured Port = , 
ping port =
INFO 2015-10-06 13:38:31,078 main.py:259 - Unable to extract AM host details 
from ZK, retrying ...
INFO 2015-10-06 13:38:41,081 Controller.py:140 - Registering with the server at 
https://localhost:8441/ws/v1/slider/agents/container_e04_1444115477719_0002_01_16___HBASE_MASTER/register
 with data '{"tags": "", "timestamp": 1444118921081, "expectedState": 0, 
"responseId": -1, "actualState": 0, "logFolders": {}, "agentVersion": "1", 
"allocatedPorts": {}, "appVersion": null, "publicHostname": 
"host-10-18-130-110", "label": 
"container_e04_1444115477719_0002_01_16___HBASE_MASTER"}'
INFO 2015-10-06 13:38:41,081 security.py:89 - SSL Connect being called.. 
connecting to the server
ERROR 2015-10-06 13:38:41,082 Controller.py:625 - Exception raised
Traceback (most recent call last):
  File 
"/home/rohith/os/tmp2.6/nm-local-dir/usercache/rohith/appcache/application_1444115477719_0002/filecache/24/slider-agent.tar.gz/slider-agent/agent/Controller.py",
 line 619, in sendRequest
self.cachedconnect = security.CachedHTTPSConnection(self.config)
  File 
"/home/rohith/os/tmp2.6/nm-local-dir/usercache/rohith/appcache/application_1444115477719_0002/filecache/24/slider-agent.tar.gz/slider-agent/agent/security.py",
 line 106, in __init__
self.connect()
  File 
"/home/rohith/os/tmp2.6/nm-local-dir/usercache/rohith/appcache/application_1444115477719_0002/filecache/24/slider-agent.tar.gz/slider-agent/agent/security.py",
 line 111, in connect
self.httpsconn.connect()
  File 
"/home/rohith/os/tmp2.6/nm-local-dir/usercache/rohith/appcache/application_1444115477719_0002/filecache/24/slider-agent.tar.gz/slider-agent/agent/security.py",
 line 49, in connect
sock=self.create_connection()
  File 
"/home/rohith/os/tmp2.6/nm-local-dir/usercache/rohith/appcache/application_1444115477719_0002/filecache/24/slider-agent.tar.gz/slider-agent/agent/security.py",
 line 90, in create_connection
sock = socket.create_connection((self.host, self.port), 60)
  File "/usr/lib64/python2.6/socket.py", line 512, in create_connection




Thanks & Regards
Rohith Sharma K S

[VOTE] Release Apache Slider 0.81.1-incubating

2015-10-26 Thread Jon Maron

Hello,

This is a call for a vote for releasing Apache Slider 0.81.1-incubating.

This is a source release.

Summary of fixes: http://s.apache.org/sgG
Release Notes:  http://s.apache.org/ZZP
Vote thread: http://s.apache.org/Ejv
Results: http://s.apache.org/ufJ

Staged artifacts:
https://repository.apache.org/content/repositories/orgapacheslider-1008/org/apache/slider

Git source:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=a9b1d659c642ab7eb7b21ecbae97f805259c9f9f
SHA1: a9b1d659c642ab7eb7b21ecbae97f805259c9f9f 
Tag: slider-0.81.1-incubating

PGP key:
http://pgp.mit.edu:11371/pks/lookup?op=vindex=jma...@apache.org

Basic build/test instructions:
http://slider.incubator.apache.org/developing/building.html

Please vote on releasing this package as Apache Slider 0.81.1-incubating.

This vote will be open for 72 hours.

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

Thank You,
The Apache Slider Team

[CANCEL] [VOTE] Apache Slider Incubating Release 0.81.0-incubating

2015-10-22 Thread Jon Maron

The issues below seemed to merit a new release candidate.  Be on the lookout 
for a vote on release 0.81.1-incubating.

> On Oct 21, 2015, at 7:03 PM, Gour Saha <gs...@hortonworks.com> wrote:
> 
> -0
> 
> Agree with Josh.
> 
> 1. Lets fix copyright year in NOTICE, add DEPENDENCIES to the rat
> exclusion list and remove all pom.xml.versionBackup files
> 2. After the above fixes we need to create a new release candidate with
> version 0.81.1-incubating and a new tag slider-0.81.1-incubating (note it
> is 0.81.1 instead of 0.81.0). We can do all this work in the same branch
> "branches/branch-0.81". However let¹s make sure that the new tag
> slider-0.81.1-incubating and branches/branch-0.81 eventually have the same
> SHA.
> 3. After taking care of 1 and 2 let¹s send out a new vote for 0.81.1
> 4. Let¹s file a Slider bug to remove busy.gif and hadoop-st.png from
> develop branch such that they get cleaned up for next release
> 5. Let¹s also capture in the bug opened in step 4 above to remove
> copyright lines from the license headers of python-wrap and storm-slider
> 
> 
> Additionally, the following were tested and found ok -
> 1. Verified pgp
> 2. Verified md5s & shas of tar and zip
> 3. Built from source (from tar and zip)
> 4. Rat check ok (except for 1 file, DEPENDENCIES - which will be taken
> care of in next release)
> 5. Ran unit tests successfully (takes about 22 mins in my local mac)
> 
> 
> -Gour
> 
> On 10/21/15, 10:22 AM, "Josh Elser" <els...@apache.org> wrote:
> 
>> -0
>> 
>> Things I think should be fixed now:
>> 
>> * There's some confusion with the Git tag and the SHA1. The SHA1 appears
>> to be what was built, but the tag (0.81.0-incubating doesn't exist, so
>> assuming you meant slider-0.81.0-incubating) doesn't match the SHA1
>> (it's at 342061f7ca5afb55f172b8d4a432497a8f8b2560 instead of
>> 38decaa05de8d962053e47040bab910cdb00f04d).
>> 
>> * source tarball contains pom.xml.versionBackup files. I'd assume they
>> were erroneously included.
>> 
>> * Copyright years in NOTICE appear incorrect. Should be 2014-2015, not
>> 2015-2016.
>> 
>> Other things I think we should fix for the next release:
>> 
>> * I don't see any copyright notice for
>> ./slider-core/src/main/resources/webapps/static/busy.gif. Looks like it
>> came in during the initial import. Unless Steve happened to make it or
>> remembers where it came from, there's some concern about someone else
>> owning it. It also doesn't appear to be used (grep doesn't find any
>> references to it, anyways), so perhaps it can just be deleted?
>> 
>> * ./slider-core/src/main/resources/webapps/static/hadoop-st.png also
>> seems to not be attributed to Hadoop and is unreferenced in code. Also a
>> candidate for deletion?
>> 
>> * Copyright year exists in some files in the license header (and
>> shouldn't/doesn't need to, afaik)
>> 
>> ** ./slider-agent/src/test/python/python-wrap
>> ** ./app-packages/storm/package/files/storm-slider
>> 
>> * DEPENDENCIES is missing from top-level pom.xml RAT plugin exclusions
>> (causes `mvn verify -Prat -DskipTests` to fail on source release).
>> 
>> Jon Maron wrote:
>>> Hello,
>>> This is a call for a vote on Apache Slider 0.81.0-incubating release.
>>> 
>>> This is a source release.
>>> 
>>> The list of all issues fixed: http://s.apache.org/ZnA
>>> 
>>> Staged artifacts:
>>> https://repository.apache.org/content/repositories/orgapacheslider-1007/
>>> Source zip: 
>>> https://repository.apache.org/content/repositories/orgapacheslider-1007/o
>>> rg/apache/slider/slider/0.81.0-incubating/slider-0.81.0-incubating-source
>>> -release.zip
>>> Source tar.gz: 
>>> https://repository.apache.org/content/repositories/orgapacheslider-1007/o
>>> rg/apache/slider/slider/0.81.0-incubating/slider-0.81.0-incubating-source
>>> -release.tar.gz
>>> 
>>> Git source:
>>> 
>>> https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h
>>> =38decaa05de8d962053e47040bab910cdb00f04d
>>> SHA1: 38decaa05de8d962053e47040bab910cdb00f04d
>>> Tag: 0.81.0-incubating
>>> 
>>> PGP key:
>>> http://pgp.mit.edu:11371/pks/lookup?op=vindex=jma...@apache.org
>>> 
>>> Build/test instructions at:
>>> http://slider.incubator.apache.org/developing/building.html
>>> 
>>> 
>>> Vote will be open for 72 hours
>>> 
>>> 
>>> [ ] +1 approve
>>> [ ] +0 no opinion
>>> [ ] -1 disapprove (and reason why)
>>> 
>>> 
>> 
> 
>

[VOTE] Apache Slider Incubating Release 0.81.1-incubating

2015-10-22 Thread Jon Maron

Hello,
This is a call for a vote on Apache Slider 0.81.0-incubating release.

This is a source release.

The list of all issues fixed: http://s.apache.org/ZnA

Staged artifacts: 
https://repository.apache.org/content/repositories/orgapacheslider-1008/
Source zip: 
https://repository.apache.org/content/repositories/orgapacheslider-1008/org/apache/slider/slider/0.81.1-incubating/slider-0.81.1-incubating-source-release.zip
Source tar.gz: 
https://repository.apache.org/content/repositories/orgapacheslider-1008/org/apache/slider/slider/0.81.1-incubating/slider-0.81.1-incubating-source-release.tar.gz

Git source:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=a9b1d659c642ab7eb7b21ecbae97f805259c9f9f
SHA1: a9b1d659c642ab7eb7b21ecbae97f805259c9f9f 
Tag: slider-0.81.1-incubating

PGP key:
http://pgp.mit.edu:11371/pks/lookup?op=vindex=jma...@apache.org

Build/test instructions at:
http://slider.incubator.apache.org/developing/building.html


Vote will be open for 72 hours 


[ ] +1 approve 
[ ] +0 no opinion 
[ ] -1 disapprove (and reason why)

Re: Slider-develop - Build # 716 - Still Failing

2015-10-13 Thread Jon Maron

SLIDER-777 and SLIDER-935 are slated for 0.81 as well.  

Gour,

  I believe you’ve committed the fix for 777?

Steve,

  Think you’ll get to 935?

— Jon

> On Oct 13, 2015, at 3:00 PM, Gour Saha  wrote:
> 
> Thomas,
> If we know the fix, let's fix it. Jon will be starting the work on 0.81.0 
> release on Oct 15. 
> 
> -Gour
> 
> - Sent from my iPhone
> 
>> On Oct 13, 2015, at 11:12 AM, "Yu Liu"  wrote:
>> 
>> Agree with Steve.
>> This test case failure is not one of the partial merge issues we saw 
>> previously
>> 
>> To fix it:
>> The simplest fix would be checking if actionQueue is created in controller 
>> or not, just for fixing the test case.
>> It also doesn't break the protocol: actionQueue is not created until 
>> controller.run()
>> 
>> Please let me know what you think
>> 
>> Thank you
>> 
>> 
>> From: Steve Loughran 
>> Sent: Tuesday, October 13, 2015 9:17 AM
>> To: dev@slider.incubator.apache.org
>> Subject: Re: Slider-develop - Build # 716 - Still Failing
>> 
>>> On 12 Oct 2015, at 19:45, Apache Jenkins Server  
>>> wrote:
>>> 
>>> The Apache Jenkins build system has built Slider-develop (build #716)
>>> 
>>> Status: Still Failing
>>> 
>>> Check console output at https://builds.apache.org/job/Slider-develop/716/ 
>>> to view the results.
>> 
>> looks like a pytest failure
>> 
>> ==
>> ERROR: test_signal_handler (TestMain.TestMain)
>> --
>> Traceback (most recent call last):
>> File 
>> "/home/jenkins/jenkins-slave/workspace/Slider-develop/slider-agent/src/test/python/mock/mock.py",
>>  line 1199, in patched
>>   return func(*args, **keywargs)
>> File 
>> "/home/jenkins/jenkins-slave/workspace/Slider-develop/slider-agent/src/test/python/agent/TestMain.py",
>>  line 60, in test_signal_handler
>>   main.signal_handler("signum", "frame")
>> File 
>> "/home/jenkins/jenkins-slave/workspace/Slider-develop/slider-agent/src/main/python/agent/main.py",
>>  line 59, in signal_handler
>>   tmpdir = controller.actionQueue.dockerManager.stop_container()
>> AttributeError: 'Controller' object has no attribute 'actionQueue'
>> 
>> 
>> Looking at the code, Controller. only gets an actionQueue in its run() 
>> method, so maybe there's a race condition —if the stop() operation kicks in 
>> too early, there's no field to stop.
>> 
>> 
>> Filed as SLIDER-946
>> 
>

Re: Slider-develop - Build # 716 - Still Failing

2015-10-13 Thread Jon Maron


> On Oct 13, 2015, at 3:09 PM, Jon Maron <jma...@hortonworks.com> wrote:
> 
> SLIDER-777 and SLIDER-935 are slated for 0.81 as well.  
> 
> Gour,
> 
>  I believe you’ve committed the fix for 777?
> 
> Steve,
> 
>  Think you’ll get to 935?

It looks like you may have committed a fix already?

> 
> — Jon
> 
>> On Oct 13, 2015, at 3:00 PM, Gour Saha <gs...@hortonworks.com> wrote:
>> 
>> Thomas,
>> If we know the fix, let's fix it. Jon will be starting the work on 0.81.0 
>> release on Oct 15. 
>> 
>> -Gour
>> 
>> - Sent from my iPhone
>> 
>>> On Oct 13, 2015, at 11:12 AM, "Yu Liu" <y...@hortonworks.com> wrote:
>>> 
>>> Agree with Steve.
>>> This test case failure is not one of the partial merge issues we saw 
>>> previously
>>> 
>>> To fix it:
>>> The simplest fix would be checking if actionQueue is created in controller 
>>> or not, just for fixing the test case.
>>> It also doesn't break the protocol: actionQueue is not created until 
>>> controller.run()
>>> 
>>> Please let me know what you think
>>> 
>>> Thank you
>>> 
>>> 
>>> From: Steve Loughran <ste...@hortonworks.com>
>>> Sent: Tuesday, October 13, 2015 9:17 AM
>>> To: dev@slider.incubator.apache.org
>>> Subject: Re: Slider-develop - Build # 716 - Still Failing
>>> 
>>>> On 12 Oct 2015, at 19:45, Apache Jenkins Server 
>>>> <jenk...@builds.apache.org> wrote:
>>>> 
>>>> The Apache Jenkins build system has built Slider-develop (build #716)
>>>> 
>>>> Status: Still Failing
>>>> 
>>>> Check console output at https://builds.apache.org/job/Slider-develop/716/ 
>>>> to view the results.
>>> 
>>> looks like a pytest failure
>>> 
>>> ==
>>> ERROR: test_signal_handler (TestMain.TestMain)
>>> --
>>> Traceback (most recent call last):
>>> File 
>>> "/home/jenkins/jenkins-slave/workspace/Slider-develop/slider-agent/src/test/python/mock/mock.py",
>>>  line 1199, in patched
>>>  return func(*args, **keywargs)
>>> File 
>>> "/home/jenkins/jenkins-slave/workspace/Slider-develop/slider-agent/src/test/python/agent/TestMain.py",
>>>  line 60, in test_signal_handler
>>>  main.signal_handler("signum", "frame")
>>> File 
>>> "/home/jenkins/jenkins-slave/workspace/Slider-develop/slider-agent/src/main/python/agent/main.py",
>>>  line 59, in signal_handler
>>>  tmpdir = controller.actionQueue.dockerManager.stop_container()
>>> AttributeError: 'Controller' object has no attribute 'actionQueue'
>>> 
>>> 
>>> Looking at the code, Controller. only gets an actionQueue in its run() 
>>> method, so maybe there's a race condition —if the stop() operation kicks in 
>>> too early, there's no field to stop.
>>> 
>>> 
>>> Filed as SLIDER-946
>>> 
>> 
>

Re: new committer

2015-07-07 Thread Jon Maron

Welcome!!

— Jon

 On Jul 7, 2015, at 12:49 PM, Billie Rinaldi billie.rina...@gmail.com wrote:
 
 Welcome Thomas Liu, a new committer and PPMC member for Apache Slider!

Re: Secured Zookeeper

2015-06-05 Thread Jon Maron

On Jun 5, 2015, at 4:12 PM, Billie Rinaldi billie.rina...@gmail.com wrote:

On Fri, Jun 5, 2015 at 12:58 PM, Steve Loughran ste...@hortonworks.com
wrote:

ooh, now ZK has interesting and complicated security. I spent more time
writing the kerberos ZK tests for the yarn registry than most of the
registry code itself, from which I came out with
-a fear of kerberos

Kerberophobia?

common and prevalent...

-a fear of its error messages
-not enough understanding of how ZK security works.

On 5 Jun 2015, at 16:16, Lei Guo lei...@huawei.com wrote:

We are trying to use Slider to manage HBase in an environment with
secured zookeeper (Kerberos). Seems there are some issues around both AM
and agent. For example, the kazoo library embedded does not support
Kerberos credential.

Just want to confirm that secured Zookeeper is not supported yet.

it should be.

The registry can be set up to be world readable, and writeable only by the
user who is starting the jobs

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/registry/registry-security.html

if your hadoop installation has YARN-2571 applied, this is done
automatically for you by the RM. I managed to get this into HDP 2.2, but
it's not in ASF Hadoop (one of the few differences)

without that, there is a way from the command line to give a user
permissions (and only that user).

Once the registry is setup, the AM will update its path under
/users/${USERNAME} with
-the URL used by the agents to find the AM
-any bindings the applications publish

There's also a bit of ZK code in the slider client which creates a
zookeeper path for an HBase cluster, under
/services/slider/users/${USERNAME}/${CLUSTERNAME}

I think that's the bit most likely to break on a secure ZK cluster, unless
you set up /services/slider/users/${USERNAME} to be writeable by that user.

Does this help? If not, we'll do what we can to get this to work. It
should work on a secure ZK cluster

Re: Need hostname for use in appConfig.json

2015-06-01 Thread Jon Maron

Alternatively, you could also try using the convention used elsewhere in 
hadoop:  username/_HOST@domain.  _HOST is generally replaced at runtime with 
the host name.  Most applications currently honor that convention (Internally 
calling SecurityUtil.getServerPrincipal(String principalConfig,String hostname))


 On Jun 1, 2015, at 12:36 PM, Gour Saha gs...@hortonworks.com wrote:
 
 Have you tried using ${THIS_HOST} in appConfig? Did it not work?
 
 -Gour
 
 On 6/1/15, 9:14 AM, Nathaniel Braun n.br...@criteo.com wrote:
 
 Hi everyone,
 
 We are currently working on the configuration files with Kerberos
 principals in them, and it turns out that the Kerberos principal is
 linked to the hostname, so we need it.
 
 What we would like to do is something like that:
 
 
 1.   In appConfig.json
 
 Set the global hostname: site.global.hostname: ${THIS_HOST}
 
 
 2.   In our default httpfs-site configuration file:
 
 Read that value using the following piece of code:
 
 namehttpfs.authentication.kerberos.principal/namevalueHTTP/${@//site
 /global/hostname}/value
 namehttpfs.hadoop.authentication.kerberos.principal/namevalueHTTP/${
 @//site/global/hostname}/value
 
 For this to work, we need the THIS_HOST variable to work in the
 appConfig.json file.
 
 How can we achieve such a feature?
 
 Thanks  regards,
 Nathaniel

Re: Need hostname for use in appConfig.json

2015-06-01 Thread Jon Maron


 On Jun 1, 2015, at 1:37 PM, Nathaniel Braun n.br...@criteo.com wrote:
 
 By the way, regarding the _HOST convention: I indeed saw hat in the HBase 
 package, and it works perfectly.
 
 What I meant was, I'm probably not the only one to need this. It would maybe 
 be good to provide an official way of accomplishing it :)

I assume you mean an official Slider mechanism, since my understanding is that 
the _HOST convention is used throughout other Hadoop related projects.

 
 Thanks
 
 
 
 From: Nathaniel Braun n.br...@criteo.com
 Sent: Monday, June 1, 2015 7:20 PM
 To: dev@slider.incubator.apache.org
 Subject: Re: Need hostname for use in appConfig.json
 
 Thanks for your answer,
 
 I did try using {THIS_HOST}, it didn't work
 
 Regards,
 Nathaniel
 
 
 From: Jon Maron jma...@hortonworks.com
 Sent: Monday, June 1, 2015 6:39:41 PM
 To: dev@slider.incubator.apache.org
 Subject: Re: Need hostname for use in appConfig.json
 
 Alternatively, you could also try using the convention used elsewhere in 
 hadoop:  username/_HOST@domain.  _HOST is generally replaced at runtime with 
 the host name.  Most applications currently honor that convention (Internally 
 calling SecurityUtil.getServerPrincipal(String principalConfig,String 
 hostname))
 
 
 On Jun 1, 2015, at 12:36 PM, Gour Saha gs...@hortonworks.com wrote:
 
 Have you tried using ${THIS_HOST} in appConfig? Did it not work?
 
 -Gour
 
 On 6/1/15, 9:14 AM, Nathaniel Braun n.br...@criteo.com wrote:
 
 Hi everyone,
 
 We are currently working on the configuration files with Kerberos
 principals in them, and it turns out that the Kerberos principal is
 linked to the hostname, so we need it.
 
 What we would like to do is something like that:
 
 
 1.   In appConfig.json
 
 Set the global hostname: site.global.hostname: ${THIS_HOST}
 
 
 2.   In our default httpfs-site configuration file:
 
 Read that value using the following piece of code:
 
 namehttpfs.authentication.kerberos.principal/namevalueHTTP/${@//site
 /global/hostname}/value
 namehttpfs.hadoop.authentication.kerberos.principal/namevalueHTTP/${
 @//site/global/hostname}/value
 
 For this to work, we need the THIS_HOST variable to work in the
 appConfig.json file.
 
 How can we achieve such a feature?
 
 Thanks  regards,
 Nathaniel

Re: Keytab issue with Hbase secure

2015-05-30 Thread Jon Maron

Are you running on JDK 8?  Could be this:  
https://issues.apache.org/jira/browse/HADOOP-10786

 On May 29, 2015, at 9:46 AM, Yohan Bismuth yohan.bismu...@gmail.com wrote:
 
 I just recompiled slider after i replaced the call to isFromKeytab()
 in validateLoginUser by isLoginKeytabBased(). Then i redeployed it on my
 gateway and launched my application.
 The error is the same as before.
 
 On Fri, May 29, 2015 at 3:36 PM, Jon Maron jma...@hortonworks.com wrote:
 
 Please describe the full procedure you used to retest.
 
 On May 29, 2015, at 9:28 AM, Yohan Bismuth yohan.bismu...@gmail.com
 wrote:
 
 ...but i got the same problem using this method instead
 
 On Fri, May 29, 2015 at 3:25 PM, Yohan Bismuth yohan.bismu...@gmail.com
 
 wrote:
 
 oops my bad, was not looking the good thing
 
 On Fri, May 29, 2015 at 3:13 PM, Jon Maron jma...@hortonworks.com
 wrote:
 
 It’s declared as public static:
 
 public synchronized static boolean isLoginKeytabBased()
 
 at least the version I’m looking at
 
 On May 29, 2015, at 8:52 AM, Yohan Bismuth yohan.bismu...@gmail.com
 wrote:
 
 Mmh I can't compile this. I don't think you can use
 isLoginKeytabBased()
 since isKeytab is private in ugi.
 
 On Fri, May 29, 2015 at 1:56 PM, Jon Maron jma...@hortonworks.com
 wrote:
 
 Looks like you’ve found a bug:  validateLoginUser should be calling
 isLoginKeytabBased(), not isFromKeytab().  Would mind filing a JIRA?
 
 — Jon
 
 On May 29, 2015, at 5:24 AM, Yohan Bismuth 
 yohan.bismu...@gmail.com
 wrote:
 
 Btw, i've tried using java7 and java8: smae issue.
 I'm correctly logged as the principal of my keytab, and i can submit
 jobs
 (like a wordcount) using this keytab.
 
 If i remove this line from the code:
 
 validateLoginUser(UserGroupInformation.getLoginUser());
 
 everything seems to work fine.
 
 hasKerberosCredentials returns true, so the login must be based on a
 kerberos ticket.  Perhaps it has expired?
 
 Well, this is the point here, i don't want the login to be based on
 a
 kerberos ticket (because it would mean the ugi has the wrong flag
 set
 to
 true). I want the login to be based on a kerberos key.
 
 
 2015-05-29 8:49 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 C'est pas très grave pour nous c'est packagé à la cradoc pour
 l'instant.
 C'est peut-etre aussi le fix ! Faudrait faire un git blame pour
 comprendre
 pourquoi l'exception est là. Elle trigge peut-etre aussi dans un
 cas
 légitime ?
 
 Question subsidiaire: tu soumets bien le job en tant que
 y.bismuth@HPC.CRITEO.PREPROD et pas y.bism...@criteois.lan ?
 (c'est
 la
 meme question que: tu as fait kinit en plus subtil)
 Tu as essayé avec un krb5.conf qui mette bien HPC.CRITEO.PREPROD en
 default realm sur la gateway.
 
 JB
 
 2015-05-28 23:05 GMT+02:00 Yohan Bismuth yohan.bismu...@gmail.com
 :
 
 Bah ça implique de modifier et recompiler slider. Je sais pas si
 on
 veut
 ça...
 
 2015-05-28 22:20 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 OK j'ai rien dit :)
 Tu peux spécifier tout ça dans le thread, si vraiment ça
 fonctionne
 sans
 l'exception, je vois même pas pourquoi on s'embête en fait ???
 
 JB
 
 
 
 
 
 --
 Jean-Baptiste Note

Re: Keytab issue with Hbase secure

2015-05-29 Thread Jon Maron

That is strange - the login is from a keytab, but the UGI indicates it’s from a 
ticket.  I’ll need to investigate some more.  Can you send me the full log?  I 
imagine the listserv will not allow it.  Thanks!

 On May 29, 2015, at 5:24 AM, Yohan Bismuth yohan.bismu...@gmail.com wrote:
 
 Btw, i've tried using java7 and java8: smae issue.
 I'm correctly logged as the principal of my keytab, and i can submit jobs
 (like a wordcount) using this keytab.
 
 If i remove this line from the code:
 
 validateLoginUser(UserGroupInformation.getLoginUser());
 
 everything seems to work fine.
 
 hasKerberosCredentials returns true, so the login must be based on a
 kerberos ticket.  Perhaps it has expired?
 
 Well, this is the point here, i don't want the login to be based on a
 kerberos ticket (because it would mean the ugi has the wrong flag set to
 true). I want the login to be based on a kerberos key.
 
 
 2015-05-29 8:49 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 C'est pas très grave pour nous c'est packagé à la cradoc pour l'instant.
 C'est peut-etre aussi le fix ! Faudrait faire un git blame pour comprendre
 pourquoi l'exception est là. Elle trigge peut-etre aussi dans un cas
 légitime ?
 
 Question subsidiaire: tu soumets bien le job en tant que
 y.bismuth@HPC.CRITEO.PREPROD et pas y.bism...@criteois.lan ? (c'est la
 meme question que: tu as fait kinit en plus subtil)
 Tu as essayé avec un krb5.conf qui mette bien HPC.CRITEO.PREPROD en
 default realm sur la gateway.
 
 JB
 
 2015-05-28 23:05 GMT+02:00 Yohan Bismuth yohan.bismu...@gmail.com:
 
 Bah ça implique de modifier et recompiler slider. Je sais pas si on veut
 ça...
 
 2015-05-28 22:20 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 OK j'ai rien dit :)
 Tu peux spécifier tout ça dans le thread, si vraiment ça fonctionne sans
 l'exception, je vois même pas pourquoi on s'embête en fait ???
 
 JB
 
 
 
 
 
 --
 Jean-Baptiste Note

Re: Keytab issue with Hbase secure

2015-05-29 Thread Jon Maron

Looks like you’ve found a bug:  validateLoginUser should be calling 
isLoginKeytabBased(), not isFromKeytab().  Would mind filing a JIRA?

— Jon

 On May 29, 2015, at 5:24 AM, Yohan Bismuth yohan.bismu...@gmail.com wrote:
 
 Btw, i've tried using java7 and java8: smae issue.
 I'm correctly logged as the principal of my keytab, and i can submit jobs
 (like a wordcount) using this keytab.
 
 If i remove this line from the code:
 
 validateLoginUser(UserGroupInformation.getLoginUser());
 
 everything seems to work fine.
 
 hasKerberosCredentials returns true, so the login must be based on a
 kerberos ticket.  Perhaps it has expired?
 
 Well, this is the point here, i don't want the login to be based on a
 kerberos ticket (because it would mean the ugi has the wrong flag set to
 true). I want the login to be based on a kerberos key.
 
 
 2015-05-29 8:49 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 C'est pas très grave pour nous c'est packagé à la cradoc pour l'instant.
 C'est peut-etre aussi le fix ! Faudrait faire un git blame pour comprendre
 pourquoi l'exception est là. Elle trigge peut-etre aussi dans un cas
 légitime ?
 
 Question subsidiaire: tu soumets bien le job en tant que
 y.bismuth@HPC.CRITEO.PREPROD et pas y.bism...@criteois.lan ? (c'est la
 meme question que: tu as fait kinit en plus subtil)
 Tu as essayé avec un krb5.conf qui mette bien HPC.CRITEO.PREPROD en
 default realm sur la gateway.
 
 JB
 
 2015-05-28 23:05 GMT+02:00 Yohan Bismuth yohan.bismu...@gmail.com:
 
 Bah ça implique de modifier et recompiler slider. Je sais pas si on veut
 ça...
 
 2015-05-28 22:20 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 OK j'ai rien dit :)
 Tu peux spécifier tout ça dans le thread, si vraiment ça fonctionne sans
 l'exception, je vois même pas pourquoi on s'embête en fait ???
 
 JB
 
 
 
 
 
 --
 Jean-Baptiste Note

Re: Keytab issue with Hbase secure

2015-05-29 Thread Jon Maron

Please describe the full procedure you used to retest.

 On May 29, 2015, at 9:28 AM, Yohan Bismuth yohan.bismu...@gmail.com wrote:
 
 ...but i got the same problem using this method instead
 
 On Fri, May 29, 2015 at 3:25 PM, Yohan Bismuth yohan.bismu...@gmail.com
 wrote:
 
 oops my bad, was not looking the good thing
 
 On Fri, May 29, 2015 at 3:13 PM, Jon Maron jma...@hortonworks.com wrote:
 
 It’s declared as public static:
 
 public synchronized static boolean isLoginKeytabBased()
 
 at least the version I’m looking at
 
 On May 29, 2015, at 8:52 AM, Yohan Bismuth yohan.bismu...@gmail.com
 wrote:
 
 Mmh I can't compile this. I don't think you can use isLoginKeytabBased()
 since isKeytab is private in ugi.
 
 On Fri, May 29, 2015 at 1:56 PM, Jon Maron jma...@hortonworks.com
 wrote:
 
 Looks like you’ve found a bug:  validateLoginUser should be calling
 isLoginKeytabBased(), not isFromKeytab().  Would mind filing a JIRA?
 
 — Jon
 
 On May 29, 2015, at 5:24 AM, Yohan Bismuth yohan.bismu...@gmail.com
 wrote:
 
 Btw, i've tried using java7 and java8: smae issue.
 I'm correctly logged as the principal of my keytab, and i can submit
 jobs
 (like a wordcount) using this keytab.
 
 If i remove this line from the code:
 
 validateLoginUser(UserGroupInformation.getLoginUser());
 
 everything seems to work fine.
 
 hasKerberosCredentials returns true, so the login must be based on a
 kerberos ticket.  Perhaps it has expired?
 
 Well, this is the point here, i don't want the login to be based on a
 kerberos ticket (because it would mean the ugi has the wrong flag set
 to
 true). I want the login to be based on a kerberos key.
 
 
 2015-05-29 8:49 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 C'est pas très grave pour nous c'est packagé à la cradoc pour
 l'instant.
 C'est peut-etre aussi le fix ! Faudrait faire un git blame pour
 comprendre
 pourquoi l'exception est là. Elle trigge peut-etre aussi dans un cas
 légitime ?
 
 Question subsidiaire: tu soumets bien le job en tant que
 y.bismuth@HPC.CRITEO.PREPROD et pas y.bism...@criteois.lan ? (c'est
 la
 meme question que: tu as fait kinit en plus subtil)
 Tu as essayé avec un krb5.conf qui mette bien HPC.CRITEO.PREPROD en
 default realm sur la gateway.
 
 JB
 
 2015-05-28 23:05 GMT+02:00 Yohan Bismuth yohan.bismu...@gmail.com:
 
 Bah ça implique de modifier et recompiler slider. Je sais pas si on
 veut
 ça...
 
 2015-05-28 22:20 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 OK j'ai rien dit :)
 Tu peux spécifier tout ça dans le thread, si vraiment ça fonctionne
 sans
 l'exception, je vois même pas pourquoi on s'embête en fait ???
 
 JB
 
 
 
 
 
 --
 Jean-Baptiste Note

Re: Keytab issue with Hbase secure

2015-05-29 Thread Jon Maron

It’s declared as public static:

public synchronized static boolean isLoginKeytabBased()

at least the version I’m looking at

 On May 29, 2015, at 8:52 AM, Yohan Bismuth yohan.bismu...@gmail.com wrote:
 
 Mmh I can't compile this. I don't think you can use isLoginKeytabBased()
 since isKeytab is private in ugi.
 
 On Fri, May 29, 2015 at 1:56 PM, Jon Maron jma...@hortonworks.com wrote:
 
 Looks like you’ve found a bug:  validateLoginUser should be calling
 isLoginKeytabBased(), not isFromKeytab().  Would mind filing a JIRA?
 
 — Jon
 
 On May 29, 2015, at 5:24 AM, Yohan Bismuth yohan.bismu...@gmail.com
 wrote:
 
 Btw, i've tried using java7 and java8: smae issue.
 I'm correctly logged as the principal of my keytab, and i can submit jobs
 (like a wordcount) using this keytab.
 
 If i remove this line from the code:
 
 validateLoginUser(UserGroupInformation.getLoginUser());
 
 everything seems to work fine.
 
 hasKerberosCredentials returns true, so the login must be based on a
 kerberos ticket.  Perhaps it has expired?
 
 Well, this is the point here, i don't want the login to be based on a
 kerberos ticket (because it would mean the ugi has the wrong flag set to
 true). I want the login to be based on a kerberos key.
 
 
 2015-05-29 8:49 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 C'est pas très grave pour nous c'est packagé à la cradoc pour l'instant.
 C'est peut-etre aussi le fix ! Faudrait faire un git blame pour
 comprendre
 pourquoi l'exception est là. Elle trigge peut-etre aussi dans un cas
 légitime ?
 
 Question subsidiaire: tu soumets bien le job en tant que
 y.bismuth@HPC.CRITEO.PREPROD et pas y.bism...@criteois.lan ? (c'est la
 meme question que: tu as fait kinit en plus subtil)
 Tu as essayé avec un krb5.conf qui mette bien HPC.CRITEO.PREPROD en
 default realm sur la gateway.
 
 JB
 
 2015-05-28 23:05 GMT+02:00 Yohan Bismuth yohan.bismu...@gmail.com:
 
 Bah ça implique de modifier et recompiler slider. Je sais pas si on
 veut
 ça...
 
 2015-05-28 22:20 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 OK j'ai rien dit :)
 Tu peux spécifier tout ça dans le thread, si vraiment ça fonctionne
 sans
 l'exception, je vois même pas pourquoi on s'embête en fait ???
 
 JB
 
 
 
 
 
 --
 Jean-Baptiste Note

Re: Keytab issue with Hbase secure

2015-05-29 Thread Jon Maron

I’ll admit to being stumped.  Looking for input from others…

 On May 29, 2015, at 9:46 AM, Yohan Bismuth yohan.bismu...@gmail.com wrote:
 
 I just recompiled slider after i replaced the call to isFromKeytab()
 in validateLoginUser by isLoginKeytabBased(). Then i redeployed it on my
 gateway and launched my application.
 The error is the same as before.
 
 On Fri, May 29, 2015 at 3:36 PM, Jon Maron jma...@hortonworks.com wrote:
 
 Please describe the full procedure you used to retest.
 
 On May 29, 2015, at 9:28 AM, Yohan Bismuth yohan.bismu...@gmail.com
 wrote:
 
 ...but i got the same problem using this method instead
 
 On Fri, May 29, 2015 at 3:25 PM, Yohan Bismuth yohan.bismu...@gmail.com
 
 wrote:
 
 oops my bad, was not looking the good thing
 
 On Fri, May 29, 2015 at 3:13 PM, Jon Maron jma...@hortonworks.com
 wrote:
 
 It’s declared as public static:
 
 public synchronized static boolean isLoginKeytabBased()
 
 at least the version I’m looking at
 
 On May 29, 2015, at 8:52 AM, Yohan Bismuth yohan.bismu...@gmail.com
 wrote:
 
 Mmh I can't compile this. I don't think you can use
 isLoginKeytabBased()
 since isKeytab is private in ugi.
 
 On Fri, May 29, 2015 at 1:56 PM, Jon Maron jma...@hortonworks.com
 wrote:
 
 Looks like you’ve found a bug:  validateLoginUser should be calling
 isLoginKeytabBased(), not isFromKeytab().  Would mind filing a JIRA?
 
 — Jon
 
 On May 29, 2015, at 5:24 AM, Yohan Bismuth 
 yohan.bismu...@gmail.com
 wrote:
 
 Btw, i've tried using java7 and java8: smae issue.
 I'm correctly logged as the principal of my keytab, and i can submit
 jobs
 (like a wordcount) using this keytab.
 
 If i remove this line from the code:
 
 validateLoginUser(UserGroupInformation.getLoginUser());
 
 everything seems to work fine.
 
 hasKerberosCredentials returns true, so the login must be based on a
 kerberos ticket.  Perhaps it has expired?
 
 Well, this is the point here, i don't want the login to be based on
 a
 kerberos ticket (because it would mean the ugi has the wrong flag
 set
 to
 true). I want the login to be based on a kerberos key.
 
 
 2015-05-29 8:49 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 C'est pas très grave pour nous c'est packagé à la cradoc pour
 l'instant.
 C'est peut-etre aussi le fix ! Faudrait faire un git blame pour
 comprendre
 pourquoi l'exception est là. Elle trigge peut-etre aussi dans un
 cas
 légitime ?
 
 Question subsidiaire: tu soumets bien le job en tant que
 y.bismuth@HPC.CRITEO.PREPROD et pas y.bism...@criteois.lan ?
 (c'est
 la
 meme question que: tu as fait kinit en plus subtil)
 Tu as essayé avec un krb5.conf qui mette bien HPC.CRITEO.PREPROD en
 default realm sur la gateway.
 
 JB
 
 2015-05-28 23:05 GMT+02:00 Yohan Bismuth yohan.bismu...@gmail.com
 :
 
 Bah ça implique de modifier et recompiler slider. Je sais pas si
 on
 veut
 ça...
 
 2015-05-28 22:20 GMT+02:00 Jean-Baptiste Note jbn...@gmail.com:
 
 OK j'ai rien dit :)
 Tu peux spécifier tout ça dans le thread, si vraiment ça
 fonctionne
 sans
 l'exception, je vois même pas pourquoi on s'embête en fait ???
 
 JB
 
 
 
 
 
 --
 Jean-Baptiste Note

Re: Keytab issue with Hbase secure

2015-05-28 Thread Jon Maron

hasKerberosCredentials returns true, so the login must be based on a kerberos 
ticket.  Perhaps it has expired?  I guess you could a kdestroy followed by 
kinit…

Which slider version are you on?  Is that error message at the bottom something 
you put in the code?  I can’t find it in the codebase.

 On May 28, 2015, at 1:18 PM, Yohan Bismuth yohan.bismu...@gmail.com wrote:
 
 Yes i did
 Le 28 mai 2015 19:15, Jon Maron jma...@hortonworks.com a écrit :
 
 Did you actually log in (kinit) prior to invoking the slider client?
 You’ll need to do that in order to establish an identity for the AM launch.
 
 On May 28, 2015, at 12:59 PM, Yohan Bismuth yohan.bismu...@gmail.com
 wrote:
 
 Hi,
 i'm facing an issue with Hbase in secure mode.
 I followed the steps described on
 http://slider.incubator.apache.org/docs/security.html
 
 i created my headless keytab (and the associated principals), which i
 deployed on hdfs and when i start an hbase application, the keytab is
 correctly packaged in the SliderAppMaster container under the keytabs
 folder, but here is the problem:
 
 2015-05-28 16:03:07,037 [main] INFO  appmaster.SliderAppMaster -
 Connecting
 to RM at 1024,address tracking URL=
 http://a4-5d-36-fd-a1-7c.hpc.criteo.preprod:1025
 2015-05-28 16:03:07,065 [main] INFO  appmaster.SliderAppMaster - Slider
 AM
 Security Mode: KEYTAB
 2015-05-28 16:03:07,065 [main] INFO  appmaster.SliderAppMaster - Token
 HDFS_DELEGATION_TOKEN
 2015-05-28 16:03:07,065 [main] INFO  appmaster.SliderAppMaster - Token
 YARN_AM_RM_TOKEN
 2015-05-28 16:03:07,093 [main] INFO  security.SecurityConfiguration - No
 host keytab file path specified. Will attempt to retrieve keytab file
 y.bismuth.keytab as a local resource for the container
 2015-05-28 16:03:07,104 [main] INFO  security.UserGroupInformation -
 Login
 successful for user y.bismuth using keytab file
 
 /hdfs/wwn/600508b1001c246eb94fcc5ff4d68b4e/yarn/data/usercache/y.bismuth/appcache/application_1432038882976_2039/container_e11_1432038882976_2039_01_01/keytabs/y.bismuth.keytab
 2015-05-28 16:03:07,104 [main] INFO  appmaster.SliderAppMaster -
 security
 enabled = true
 
 
 
 2015-05-28 16:03:07,104 [main] INFO  appmaster.SliderAppMaster -
 SOME DEBUG
 2015-05-28 16:03:07,104 [main] INFO  appmaster.SliderAppMaster - UGI =
 y.bismuth@HPC.CRITEO.PREPROD (auth:KERBEROS)
 2015-05-28 16:03:07,104 [main] INFO  appmaster.SliderAppMaster -
 isKeytab
 = false
 2015-05-28 16:03:07,104 [main] INFO  appmaster.SliderAppMaster - tokens
 =
 []
 2015-05-28 16:03:07,104 [main] INFO  appmaster.SliderAppMaster -
 hasKerberosCredentials = true
 2015-05-28 16:03:07,104 [main] INFO  appmaster.SliderAppMaster -
 credentials = org.apache.hadoop.security.Credentials@1cf2fed4
 2015-05-28 16:03:07,104 [main] INFO  appmaster.SliderAppMaster -
 authentication method = KERBEROS
 2015-05-28 16:03:07,111 [main] INFO  appmaster.SliderAppMaster - config
 =
 Configuration: core-default.xml, core-site.xml, yarn-default.xml,
 yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
 org/apache/slider/slider.xml, mapred-default.xml, mapred-site.xml
 2015-05-28 16:03:07,111 [main] INFO  appmaster.SliderAppMaster - SOME
 DEBUG
 
 
 
 2015-05-28 16:03:07,112 [main] ERROR main.ServiceLauncher - User is not
 based on a keytab in a secure deployment.
 
 
 So as far as i can see, i'm logging in successfully using the keytab
 packaged in the container, but the flag isKeytab, which should be set to
 true in my UGI (i hope), is not, and i can't figure out why. Because of
 that, my SliderAppMaster crash.
 
 Any idea ?

Re: slider may report

2015-05-06 Thread Jon Maron

+1

 On May 6, 2015, at 8:58 AM, Billie Rinaldi billie.rina...@gmail.com wrote:
 
 How does this draft look?
 
 Slider
 
 Slider is a collection of tools and technologies to package, deploy, and
 manage long running applications on Apache Hadoop YARN clusters.
 
 Slider has been incubating since 2014-04-29.
 
 Three most important issues to address in the move towards graduation:
 
  1. Building a diverse developer and user community
  2. Achieving broader adoption of the existing code and slider-deployable
 applications (examples: HBase, Accumulo)
  3. Making slider better at deploying other applications, so improving
 takeup.
 
 Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
 aware of?
 
  None
 
 How has the community developed since the last report?
 
  We are getting some questions from users on the list and an occasional
  patch submitted.  A number of features added to the recent releases
  were driven by user needs.
 
 How has the project developed since the last report?
 
  We completed release 0.61.0-incubating in February with more complex
  release artifacts than we had released previously.  We also released
  0.70.1-incubating in March.  Planning is under way for releasing 0.80 in
  May 2015.
 
 Date of last release:
 
  2015-03-31: Slider 0.70.1-incubating
  2015-02-19: Slider 0.61.0-incubating
 
 When were the last committers or PMC members elected?
 
  2014-09-27: Gour Saha, committer and PPMC member

Re: Need help in starting storm on yarn using slider

2015-04-09 Thread Jon Maron

 with status
69


On Wed, Apr 8, 2015 at 7:14 PM, Jon Maron jma...@hortonworks.com
wrote:

Indications seem to be that the AM is started but the AM URI you’re
attempting to attach to may be mistaken or there may be something
preventing the actual connection.  Any chance iptables is enabled?


On Apr 8, 2015, at 3:44 AM, Gour Saha gs...@hortonworks.com wrote:

Jon was right. I think Storm uses ${USER_NAME} for app_user instead
of
hard coding as yarn unlike hbase. So either users were fine.

One thing I saw in the AM and RM urls is that they link to
zs-aaa-001.nm.flipkart.com and zs-exp-01.nm.flipkart.com. Can you hand
edit the AM URL to try both the host aliases?

I am not sure if the above will work in which case if you could send
the
entire AM logs then it would be great.

-Gour

- Sent from my iPhone

On Apr 7, 2015, at 11:08 PM, Chackravarthy Esakkimuthu 
chaku.mi...@gmail.com wrote:

Tried running with 'yarn' user, but it remains in same state.
AM link not working, and AM logs are similar.

On Wed, Apr 8, 2015 at 2:14 AM, Gour Saha gs...@hortonworks.com
wrote:

In a non-secured cluster you should run as yarn. Can you do that
and
let
us know how it goes?

Also you can stop your existing storm instance in hdfs user (run as
hdfs
user) by running stop first -
slider stop storm1

-Gour

On 4/7/15, 1:39 PM, Chackravarthy Esakkimuthu
chaku.mi...@gmail.com

wrote:

This is not a secured cluster.
And yes, I used 'hdfs' user while running slider create.

On Wed, Apr 8, 2015 at 2:03 AM, Gour Saha gs...@hortonworks.com

wrote:

Which user are you running the slider create command as? Seems
like
you
are running as hdfs user. Is this a secured cluster?

-Gour

On 4/7/15, 1:06 PM, Chackravarthy Esakkimuthu 
chaku.mi...@gmail.com
wrote:

yes, RM HA has been setup in this cluster.

Active : zs-aaa-001.nm.flipkart.com
Standby : zs-aaa-002.nm.flipkart.com

RM Link :
http://zs-aaa-001.nm.flipkart.com:8088/cluster/scheduler
http://zs-exp-01.nm.flipkart.com:8088/cluster/scheduler

AM Link :



http://zs-aaa-001.nm.flipkart.com:8088/proxy/application_1427882795362_00
7
0/slideram




http://zs-exp-01.nm.flipkart.com:8088/proxy/application_1427882795362_007
0/slideram

On Wed, Apr 8, 2015 at 1:05 AM, Gour Saha
gs...@hortonworks.com
wrote:

Sorry forgot that the AM link not working was the original
issue.

Few more things -
- Seems like you have RM HA setup, right?
- Can you copy paste the complete link of the RM UI and the URL
of
ApplicationMaster (the link which is broken) with actual
hostnames?


-Gour

On 4/7/15, 11:43 AM, Chackravarthy Esakkimuthu
chaku.mi...@gmail.com

wrote:

Since 5 containers are running, which means that Storm daemons
are
already
up and running?


Actually the ApplicationMaster link is not working. It just
blanks
out
printing the following :

This is standby RM. Redirecting to the current active RM:
http://
host-name:8088/proxy/application_1427882795362_0070/slideram


And for resources.json, I dint make any change and used the
copy
of
resources-default.json as follows:


{

schema : http://example.org/specification/v2.0.0;,

metadata : {

},

global : {

yarn.log.include.patterns: ,

yarn.log.exclude.patterns: 

},

components: {

slider-appmaster: {

  yarn.memory: 512

},

NIMBUS: {

  yarn.role.priority: 1,

  yarn.component.instances: 1,

  yarn.memory: 2048

},

STORM_UI_SERVER: {

  yarn.role.priority: 2,

  yarn.component.instances: 1,

  yarn.memory: 1278

},

DRPC_SERVER: {

  yarn.role.priority: 3,

  yarn.component.instances: 1,

  yarn.memory: 1278

},

SUPERVISOR: {

  yarn.role.priority: 4,

  yarn.component.instances: 1,

  yarn.memory: 3072

}

}

}



On Tue, Apr 7, 2015 at 11:52 PM, Gour Saha 
gs...@hortonworks.com
wrote:

Chackra sent the attachment directly to me. From what I see
the
cluster
resources (memory and cores) are abundant.

But I also see that only 1 app is running which is the one we
are
trying
to debug and 5 containers are running. So definitely more
containers
that
just the AM is running.

Can you click on the app master link and copy paste the
content
of
that
page? No need for screen shot. Also please send your
resources
JSON
file.

-Gour

- Sent from my iPhone

On Apr 7, 2015, at 11:01 AM, Jon Maron
jma...@hortonworks.com
wrote:


On Apr 7, 2015, at 1:36 PM, Chackravarthy Esakkimuthu 
chaku.mi...@gmail.commailto:chaku.mi...@gmail.com wrote:

@Maron, I could not get the logs even though the application
is
still
running.
It's a 10 node cluster and I logged into one of the node and
executed
the command :

sudo -u hdfs yarn logs -applicationId
application_1427882795362_0070
15/04/07 22:56:09 INFO impl.TimelineClientImpl: Timeline
service
address: http://$HOST:PORT/ws/v1/timeline/
15/04/07 22:56:09 INFO
client.ConfiguredRMFailoverProxyProvider:
Failing
over to rm2
/app-logs/hdfs/logs/application_1427882795362_0070does not
have
any
log
files.

Can you login to the cluster node and look at the logs
directory
(e.g.
in HDP install it would be under /hadoop/yarn/logs IIRC

Re: Need help in starting storm on yarn using slider

2015-04-09 Thread Jon Maron

Aside from the yarn issue you discovered wrt HA, if you have any 
recommendations for usability/diagnosability, please feel free to let us know 
or file JIRAs (e.g.  perhaps the error message below should add “Please make 
sure you are logged in as the application owner” :) )

— Jon

 On Apr 9, 2015, at 11:29 AM, Chackravarthy Esakkimuthu 
 chaku.mi...@gmail.com wrote:
 
 Thanks steve, I was running with 'yarn' user while creating storm
 application on YARN, but forgot to run as 'yarn' user while checking the
 application status.
 And yeah, connected to zookeeper and checked under /registry/users. (as
 well as the way you suggested)
 
 Thanks all, Now I could able to submit sample topology on deployed storm
 also. :)
 
 I will try other actions like stopping/killing the running instance. Thanks
 guys again!!
 
 
 On Thu, Apr 9, 2015 at 7:03 PM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 
 On 9 Apr 2015, at 12:51, Chackravarthy Esakkimuthu chaku.mi...@gmail.com
 mailto:chaku.mi...@gmail.com wrote:
 
 *2015-04-09 17:14:44,667 [main] ERROR main.ServiceLauncher -
 /registry/users/chackaravarthy.e/services/org-apache-slider/storm2*
 *2015-04-09 17:14:44,671 [main] INFO  util.ExitUtil - Exiting with status
 44*
 
 
 44 is our exit code not found (== 404); it's saying the registry entry
 did not exist in zookeeper
 
 Looking at the path, I worry about the username chackaravarthy.e ; maybe
 its registering under a different user.
 
 
  1.  Can you get to the Slider AM page via the RM?
  2.  Can you then look at the listing of exported URLs there? Especially
 one registry.
  3.  Click on that and you can then browse a JSON view of the registry;
  4.  add /users to the end of the path to get the listing of all users,
 /users/chackaravarthy.e/  to see the services you have under you ... you
 should be able to continue down to see if there is a storm2 service entry

Re: Need help in starting storm on yarn using slider

2015-04-08 Thread Jon Maron

Indications seem to be that the AM is started but the AM URI you’re attempting 
to attach to may be mistaken or there may be something preventing the actual 
connection.  Any chance iptables is enabled?


 On Apr 8, 2015, at 3:44 AM, Gour Saha gs...@hortonworks.com wrote:
 
 Jon was right. I think Storm uses ${USER_NAME} for app_user instead of hard 
 coding as yarn unlike hbase. So either users were fine. 
 
 One thing I saw in the AM and RM urls is that they link to 
 zs-aaa-001.nm.flipkart.com and zs-exp-01.nm.flipkart.com. Can you hand edit 
 the AM URL to try both the host aliases?
 
 I am not sure if the above will work in which case if you could send the 
 entire AM logs then it would be great. 
 
 -Gour
 
 - Sent from my iPhone
 
 On Apr 7, 2015, at 11:08 PM, Chackravarthy Esakkimuthu 
 chaku.mi...@gmail.com wrote:
 
 Tried running with 'yarn' user, but it remains in same state.
 AM link not working, and AM logs are similar.
 
 On Wed, Apr 8, 2015 at 2:14 AM, Gour Saha gs...@hortonworks.com wrote:
 
 In a non-secured cluster you should run as yarn. Can you do that and let
 us know how it goes?
 
 Also you can stop your existing storm instance in hdfs user (run as hdfs
 user) by running stop first -
 slider stop storm1
 
 -Gour
 
 On 4/7/15, 1:39 PM, Chackravarthy Esakkimuthu chaku.mi...@gmail.com
 wrote:
 
 This is not a secured cluster.
 And yes, I used 'hdfs' user while running slider create.
 
 On Wed, Apr 8, 2015 at 2:03 AM, Gour Saha gs...@hortonworks.com wrote:
 
 Which user are you running the slider create command as? Seems like you
 are running as hdfs user. Is this a secured cluster?
 
 -Gour
 
 On 4/7/15, 1:06 PM, Chackravarthy Esakkimuthu chaku.mi...@gmail.com
 wrote:
 
 yes, RM HA has been setup in this cluster.
 
 Active : zs-aaa-001.nm.flipkart.com
 Standby : zs-aaa-002.nm.flipkart.com
 
 RM Link : http://zs-aaa-001.nm.flipkart.com:8088/cluster/scheduler
 http://zs-exp-01.nm.flipkart.com:8088/cluster/scheduler
 
 AM Link :
 http://zs-aaa-001.nm.flipkart.com:8088/proxy/application_1427882795362_00
 7
 0/slideram
 
 http://zs-exp-01.nm.flipkart.com:8088/proxy/application_1427882795362_007
 0/slideram
 
 On Wed, Apr 8, 2015 at 1:05 AM, Gour Saha gs...@hortonworks.com
 wrote:
 
 Sorry forgot that the AM link not working was the original issue.
 
 Few more things -
 - Seems like you have RM HA setup, right?
 - Can you copy paste the complete link of the RM UI and the URL of
 ApplicationMaster (the link which is broken) with actual hostnames?
 
 
 -Gour
 
 On 4/7/15, 11:43 AM, Chackravarthy Esakkimuthu
 chaku.mi...@gmail.com
 
 wrote:
 
 Since 5 containers are running, which means that Storm daemons are
 already
 up and running?
 
 
 Actually the ApplicationMaster link is not working. It just blanks
 out
 printing the following :
 
 This is standby RM. Redirecting to the current active RM:
 http://host-name:8088/proxy/application_1427882795362_0070/slideram
 
 
 And for resources.json, I dint make any change and used the copy of
 resources-default.json as follows:
 
 
 {
 
 schema : http://example.org/specification/v2.0.0;,
 
 metadata : {
 
 },
 
 global : {
 
  yarn.log.include.patterns: ,
 
  yarn.log.exclude.patterns: 
 
 },
 
 components: {
 
  slider-appmaster: {
 
yarn.memory: 512
 
  },
 
  NIMBUS: {
 
yarn.role.priority: 1,
 
yarn.component.instances: 1,
 
yarn.memory: 2048
 
  },
 
  STORM_UI_SERVER: {
 
yarn.role.priority: 2,
 
yarn.component.instances: 1,
 
yarn.memory: 1278
 
  },
 
  DRPC_SERVER: {
 
yarn.role.priority: 3,
 
yarn.component.instances: 1,
 
yarn.memory: 1278
 
  },
 
  SUPERVISOR: {
 
yarn.role.priority: 4,
 
yarn.component.instances: 1,
 
yarn.memory: 3072
 
  }
 
 }
 
 }
 
 
 
 On Tue, Apr 7, 2015 at 11:52 PM, Gour Saha gs...@hortonworks.com
 wrote:
 
 Chackra sent the attachment directly to me. From what I see the
 cluster
 resources (memory and cores) are abundant.
 
 But I also see that only 1 app is running which is the one we are
 trying
 to debug and 5 containers are running. So definitely more
 containers
 that
 just the AM is running.
 
 Can you click on the app master link and copy paste the content of
 that
 page? No need for screen shot. Also please send your resources
 JSON
 file.
 
 -Gour
 
 - Sent from my iPhone
 
 On Apr 7, 2015, at 11:01 AM, Jon Maron
 jma...@hortonworks.com
 wrote:
 
 
 On Apr 7, 2015, at 1:36 PM, Chackravarthy Esakkimuthu 
 chaku.mi...@gmail.commailto:chaku.mi...@gmail.com wrote:
 
 @Maron, I could not get the logs even though the application is
 still
 running.
 It's a 10 node cluster and I logged into one of the node and
 executed
 the command :
 
 sudo -u hdfs yarn logs -applicationId
 application_1427882795362_0070
 15/04/07 22:56:09 INFO impl.TimelineClientImpl: Timeline service
 address: http://$HOST:PORT/ws/v1/timeline/
 15/04/07 22:56:09 INFO client.ConfiguredRMFailoverProxyProvider:
 Failing
 over to rm2
 /app-logs/hdfs/logs/application_1427882795362_0070does

Re: Need help in starting storm on yarn using slider

2015-04-07 Thread Jon Maron


On Apr 7, 2015, at 1:36 PM, Chackravarthy Esakkimuthu 
chaku.mi...@gmail.commailto:chaku.mi...@gmail.com wrote:

@Maron, I could not get the logs even though the application is still running.
It's a 10 node cluster and I logged into one of the node and executed the 
command :

sudo -u hdfs yarn logs -applicationId application_1427882795362_0070
15/04/07 22:56:09 INFO impl.TimelineClientImpl: Timeline service address: 
http://$HOST:PORT/ws/v1/timeline/
15/04/07 22:56:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
/app-logs/hdfs/logs/application_1427882795362_0070does not have any log files.

Can you login to the cluster node and look at the logs directory (e.g. in HDP 
install it would be under /hadoop/yarn/logs IIRC)?



@Gour, Please find the attachment.

On Tue, Apr 7, 2015 at 10:57 PM, Gour Saha 
gs...@hortonworks.commailto:gs...@hortonworks.com wrote:
Can you take a screenshot of your RM UI and send it over? It is usually
available in a URI similar to http://c6410.ambari.apache.org:8088/cluster.
I am specifically interested in seeing the Cluster Metrics table.

-Gour

On 4/7/15, 10:17 AM, Jon Maron 
jma...@hortonworks.commailto:jma...@hortonworks.com wrote:


 On Apr 7, 2015, at 1:14 PM, Jon Maron 
 jma...@hortonworks.commailto:jma...@hortonworks.com wrote:


 On Apr 7, 2015, at 1:08 PM, Chackravarthy Esakkimuthu
chaku.mi...@gmail.commailto:chaku.mi...@gmail.com wrote:

 Thanks for the reply guys!
 Contianer allocation happened successfully.

 *RoleStatus{name='slider-appmaster', key=0, minimum=0, maximum=1,
 desired=1, actual=1,*
 *RoleStatus{name='STORM_UI_SERVER', key=2, minimum=0, maximum=1,
desired=1,
 actual=1, *
 *RoleStatus{name='NIMBUS', key=1, minimum=0, maximum=1, desired=1,
 actual=1, *
 *RoleStatus{name='DRPC_SERVER', key=3, minimum=0, maximum=1, desired=1,
 actual=1, *
 *RoleStatus{name='SUPERVISOR', key=4, minimum=0, maximum=1, desired=1,
 actual=1,*

 Also, have put some logs specific to a container.. (nimbus) Same set of
 logs available for other Roles also (except Supervisor which has only
first
 2 lines of below logs)

 *Installing NIMBUS on container_e04_1427882795362_0070_01_02.*
 *Starting NIMBUS on container_e04_1427882795362_0070_01_02.*
 *Registering component container_e04_1427882795362_0070_01_02*
 *Requesting applied config for NIMBUS on
 container_e04_1427882795362_0070_01_02.*
 *Received and processed config for
 container_e04_1427882795362_0070_01_02___NIMBUS*

 Does this result in any intermediate state?

 @Maron, I didn't configure any port specifically.. do I need to to?
Also, i
 don't see any error msg in AM logs wrt port conflict.

 My only concern was whether you were actually accession the web UIs at
the correct host and port.  If you are then the next step is probably to
look at the actual storm/hbase logs.  you can use the ³yarn logs
-applicationid ..² command.

*accessing* ;)



 Thanks,
 Chackra



 On Tue, Apr 7, 2015 at 9:02 PM, Jon Maron 
 jma...@hortonworks.commailto:jma...@hortonworks.com
wrote:


 On Apr 7, 2015, at 11:03 AM, Billie Rinaldi
billie.rina...@gmail.commailto:billie.rina...@gmail.com
 wrote:

 One thing you can check is whether your system has enough resources
to
 allocate all the containers the app needs.  You will see info like
the
 following in the AM log (it will be logged multiple times over the
life
 of
 the AM).  In this case, the master I requested was allocated but the
 tservers were not.
 RoleStatus{name='ACCUMULO_TSERVER', key=2, desired=2, actual=0,
 requested=2, releasing=0, failed=0, started=0, startFailed=0,
 completed=0,
 failureMessage=''}
 RoleStatus{name='ACCUMULO_MASTER', key=1, desired=1, actual=1,
 requested=0,
 releasing=0, failed=0, started=0, startFailed=0, completed=0,
 failureMessage=Œ'}

 You can also check the ³Scheduler² link on the RM Web UI to get a
sense of
 whether you are resource constrained.

 Are you certain that you are attempting to invoke the correct port?
The
 listening ports are dynamically allocated by Slider.



 On Tue, Apr 7, 2015 at 3:29 AM, Chackravarthy Esakkimuthu 
 chaku.mi...@gmail.commailto:chaku.mi...@gmail.com wrote:

 Hi All,

 I am new to Apache slider and would like to contribute.

 Just to start with, I am trying out running storm and  hbase on
yarn
 using slider following the guide :




http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/run
ning_applications_on_slider/index.html#Item1.1

 In both (storm and hbase) the cases, the ApplicationMaster gets
launched
 and still running, but the ApplicationMaster link not working, and
from
 AM
 logs, I don't see any errors.

 How do I debug from this? Please help me.
 Incase if there is any other mail thread with respect this, please
point
 out to me. Thanks in advance.

 Thanks,
 Chackra

Re: Need help in starting storm on yarn using slider

2015-04-07 Thread Jon Maron


 On Apr 7, 2015, at 1:14 PM, Jon Maron jma...@hortonworks.com wrote:
 
 
 On Apr 7, 2015, at 1:08 PM, Chackravarthy Esakkimuthu 
 chaku.mi...@gmail.com wrote:
 
 Thanks for the reply guys!
 Contianer allocation happened successfully.
 
 *RoleStatus{name='slider-appmaster', key=0, minimum=0, maximum=1,
 desired=1, actual=1,*
 *RoleStatus{name='STORM_UI_SERVER', key=2, minimum=0, maximum=1, desired=1,
 actual=1, *
 *RoleStatus{name='NIMBUS', key=1, minimum=0, maximum=1, desired=1,
 actual=1, *
 *RoleStatus{name='DRPC_SERVER', key=3, minimum=0, maximum=1, desired=1,
 actual=1, *
 *RoleStatus{name='SUPERVISOR', key=4, minimum=0, maximum=1, desired=1,
 actual=1,*
 
 Also, have put some logs specific to a container.. (nimbus) Same set of
 logs available for other Roles also (except Supervisor which has only first
 2 lines of below logs)
 
 *Installing NIMBUS on container_e04_1427882795362_0070_01_02.*
 *Starting NIMBUS on container_e04_1427882795362_0070_01_02.*
 *Registering component container_e04_1427882795362_0070_01_02*
 *Requesting applied config for NIMBUS on
 container_e04_1427882795362_0070_01_02.*
 *Received and processed config for
 container_e04_1427882795362_0070_01_02___NIMBUS*
 
 Does this result in any intermediate state?
 
 @Maron, I didn't configure any port specifically.. do I need to to? Also, i
 don't see any error msg in AM logs wrt port conflict.
 
 My only concern was whether you were actually accession the web UIs at the 
 correct host and port.  If you are then the next step is probably to look at 
 the actual storm/hbase logs.  you can use the “yarn logs -applicationid ..” 
 command.

*accessing* ;)  

 
 
 Thanks,
 Chackra
 
 
 
 On Tue, Apr 7, 2015 at 9:02 PM, Jon Maron jma...@hortonworks.com wrote:
 
 
 On Apr 7, 2015, at 11:03 AM, Billie Rinaldi billie.rina...@gmail.com
 wrote:
 
 One thing you can check is whether your system has enough resources to
 allocate all the containers the app needs.  You will see info like the
 following in the AM log (it will be logged multiple times over the life
 of
 the AM).  In this case, the master I requested was allocated but the
 tservers were not.
 RoleStatus{name='ACCUMULO_TSERVER', key=2, desired=2, actual=0,
 requested=2, releasing=0, failed=0, started=0, startFailed=0,
 completed=0,
 failureMessage=''}
 RoleStatus{name='ACCUMULO_MASTER', key=1, desired=1, actual=1,
 requested=0,
 releasing=0, failed=0, started=0, startFailed=0, completed=0,
 failureMessage=‘'}
 
 You can also check the “Scheduler” link on the RM Web UI to get a sense of
 whether you are resource constrained.
 
 Are you certain that you are attempting to invoke the correct port?  The
 listening ports are dynamically allocated by Slider.
 
 
 
 On Tue, Apr 7, 2015 at 3:29 AM, Chackravarthy Esakkimuthu 
 chaku.mi...@gmail.com wrote:
 
 Hi All,
 
 I am new to Apache slider and would like to contribute.
 
 Just to start with, I am trying out running storm and  hbase on yarn
 using slider following the guide :
 
 
 
 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/running_applications_on_slider/index.html#Item1.1
 
 In both (storm and hbase) the cases, the ApplicationMaster gets launched
 and still running, but the ApplicationMaster link not working, and from
 AM
 logs, I don't see any errors.
 
 How do I debug from this? Please help me.
 Incase if there is any other mail thread with respect this, please point
 out to me. Thanks in advance.
 
 Thanks,
 Chackra

Re: Invalid port 0 for storm instances

2015-03-30 Thread Jon Maron

We transitioned from “ALLOCATED_PORT” to “PER_CONTAINER” - I just can’t recall 
whether that was in the 0.60 timeframe?

— Jon

 On Mar 30, 2015, at 1:21 PM, Sumit Mohanty sumit.moha...@gmail.com wrote:
 
 Did you create the storm package yourself? Can you share the appConfig.json
 you are using?
 
 On Mon, Mar 30, 2015 at 10:07 AM, Nitin Aggarwal 
 nitin3588.aggar...@gmail.com wrote:
 
 It's a typo just in the mail. It is replaced correctly for some of the
 supervisors configuration.
 I am running slider version 0.60.
 
 On Mon, Mar 30, 2015 at 10:04 AM, Sumit Mohanty smoha...@hortonworks.com
 wrote:
 
 If the exact text was ALOOCATED_PORT then replacing them with
 ALLOCATED_PORT might solve it.
 
 Otherwise, whats the version of Slider are you using?
 
 From: Nitin Aggarwal nitin3588.aggar...@gmail.com
 Sent: Monday, March 30, 2015 9:51 AM
 To: dev@slider.incubator.apache.org
 Subject: Invalid port 0 for storm instances
 
 Hi,
 
 I am trying to run storm cluster with 200 instances, using Slider. While
 submitting topologies to the cluster, some of the workers failed to start
 due to error.
 
 2015-03-27T13:27:40.168-0400 b.s.event [ERROR] Error when processing
 event
 java.lang.IllegalArgumentException: invalid port: 0
 at backtype.storm.security.auth.ThriftClient.init(ThriftClient.java:54)
 ~[storm-core-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT]
 at backtype.storm.utils.NimbusClient.init(NimbusClient.java:47)
 ~[storm-core-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT]
 at backtype.storm.utils.NimbusClient.init(NimbusClient.java:43)
 ~[storm-core-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT]
 at
 
 backtype.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:36)
 ~[storm-core-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT]
 at backtype.storm.utils.Utils.downloadFromMaster(Utils.java:253)
 ~[storm-core-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT]
 at backtype.storm.daemon.supervisor$fn__6900.invoke(supervisor.clj:482)
 ~[storm-core-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT]
 at clojure.lang.MultiFn.invoke(MultiFn.java:241) ~[clojure-1.5.1.jar:na]
 at
 
 
 backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this__6820.invoke(supervisor.clj:371)
 ~[storm-core-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT]
 at backtype.storm.event$event_manager$fn__2825.invoke(event.clj:40)
 ~[storm-core-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT]
 at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
 at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40]
 
 I found that some of the supervisors don't have the correct
 configurations.
 Their configuration, still have markers like ${NIMBUS_HOST},
 ${NIMBUS.ALOOCATED_PORT}.
 
 Are these markers expected in supervisor storm configuration ?
 
 Thanks
 Nitin
 
 
 
 
 
 -- 
 thanks
 Sumit

Re: [VOTE] Apache Slider Incubating Release 0.70.0-incubating

2015-03-09 Thread Jon Maron

+1

 On Mar 6, 2015, at 11:14 PM, Gour Saha gs...@hortonworks.com wrote:
 
 Hello,
 
 This is a call for a vote on Apache Slider Incubating 0.70.0-incubating 
 release.
 
 This is a source+binary release.
 
 The issues fixed in this release are listed at at:
 https://issues.apache.org/jira/browse/SLIDER/fixforversion/12327847 (or the 
 shortened URL http://s.apache.org/AnM)
 
 
 Artifacts at
 https://repository.apache.org/content/repositories/orgapacheslider-1004/org/apache/slider
 
 
 Git source tag:
 https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=tag;h=refs/tags/slider-0.70.0-incubating
 
 
 PGP keys at
 http://pgp.mit.edu/pks/lookup?op=vindexsearch=gourks...@apache.org
 
 
 Build instructions at:
 http://slider.incubator.apache.org/developing/building.html
 
 
 Vote will be open for 72 hours
 
 [ ] +1 approve
 [ ] +0 no opinion
 [ ] -1 disapprove (and reason why)
 
 
 To start, here's my vote: +1
 
 -Gour

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Intermittent issues accessing zookeeper

2015-02-25 Thread Jon Maron


 On Feb 25, 2015, at 10:16 AM, Gour Saha gs...@hortonworks.com wrote:
 
 Can you check the zk logs at /var/log/zookeeper/zookeeper.out and see if
 you find something?

I see a bunch of these but I’m assuming these are normal for a disconnected 
client connection:

2015-02-23 19:40:21,320 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket 
connection for client /192.168.64.105:34018 (no session established for client)
2015-02-23 19:41:21,311 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /192.168.64.105:34031
2015-02-23 19:41:21,319 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, 
likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2015-02-23 19:41:21,319 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket 
connection for client /192.168.64.105:34031 (no session established for client)
2015-02-23 19:41:52,896 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /192.168.64.104:46949
2015-02-23 19:41:52,896 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
attempting to establish new session at /192.168.64.104:46949
2015-02-23 19:41:52,900 - INFO  [CommitProcessor:4:ZooKeeperServer@617] - 
Established session 0x44bb7e82d730002 with negotiated timeout 1 for client 
/192.168.64.104:46949
2015-02-23 19:41:52,916 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket 
connection for client /192.168.64.104:46949 which had sessionid 
0x44bb7e82d730002
2015-02-23 19:42:21,313 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /192.168.64.105:34054
2015-02-23 19:42:21,314 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, 
likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2015-02-23 19:42:21,314 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket 
connection for client /192.168.64.105:34054 (no session established for client)
2015-02-23 19:42:38,263 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /192.168.64.1:52286
2015-02-23 19:42:38,265 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
attempting to establish new session at /192.168.64.1:52286
2015-02-23 19:42:38,269 - INFO  [CommitProcessor:4:ZooKeeperServer@617] - 
Established session 0x44bb7e82d730003 with negotiated timeout 4 for client 
/192.168.64.1:52286
2015-02-23 19:42:39,316 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket 
connection for client /192.168.64.1:52286 which had sessionid 0x44bb7e82d730003
2015-02-23 19:43:14,665 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /192.168.64.105:34129
2015-02-23 19:43:14,667 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
attempting to establish new session at /192.168.64.105:34129
2015-02-23 19:43:14,672 - INFO  [CommitProcessor:4:ZooKeeperServer@617] - 
Established session 0x44bb7e82d730004 with negotiated timeout 1 for client 
/192.168.64.105:34129
2015-02-23 19:43:14,681 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x44bb7e82d730004, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)


 
 Also, see if you can use the zkCli.sh client to query in a loop for few
 minutes (with few secs of interval between queries) and see if you get
 similar intermittent connection issues?

Tried “watch -n 5 ./zkCli.sh ls /services”.  Didn’t see any issue on the client 
side, though interestingly at times the connection seemed to be using the IPv6 
address?

 
 -Gour
 
 On 2/25/15, 6:53 AM, Jon Maron jma...@hortonworks.com wrote:
 
 I¹ve noticed that I¹m having intermittent issues accessing the zookeeper

Intermittent issues accessing zookeeper

2015-02-25 Thread Jon Maron

I’ve noticed that I’m having intermittent issues accessing the zookeeper quorum 
during “destroy” attempts:

2015-02-25 09:48:02,345 [main] WARN  client.SliderClient 
(SliderClient.java:getZkClient(523)) - Unable to connect to zookeeper quorum 
c6402.ambari.apache.org:2181,c6404.ambari.apache.org:2181,c6403.ambari.apache.org:2181,c6405.ambari.apache.org:2181
java.net.ConnectException: Unable to connect to ZK quorum
at 
org.apache.slider.core.zk.BlockingZKWatcher.waitForZKConnection(BlockingZKWatcher.java:63)
at 
org.apache.slider.client.SliderClient.getZkClient(SliderClient.java:518)
at 
org.apache.slider.client.SliderClient.deleteZookeeperNode(SliderClient.java:458)
at 
org.apache.slider.client.SliderClient.actionDestroy(SliderClient.java:550)
at org.apache.slider.client.SliderClient.exec(SliderClient.java:383)
at 
org.apache.slider.client.SliderClient.runService(SliderClient.java:348)
at 
org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
at 
org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475)
at 
org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403)
at 
org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630)
at org.apache.slider.Slider.main(Slider.java:49)
2015-02-25 09:48:02,656 [main] DEBUG client.SliderClient 
(SliderClient.java:deleteZookeeperNode(474)) - Unable to recursively delete zk 
node /services/slider/users/jmaron/hbase-test
2015-02-25 09:48:02,656 [main] DEBUG client.SliderClient 
(SliderClient.java:deleteZookeeperNode(475)) - Reason: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /services/slider/users/jmaron/hbase-test
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
at org.apache.slider.core.zk.ZKIntegration.stat(ZKIntegration.java:164)
at 
org.apache.slider.core.zk.ZKIntegration.exists(ZKIntegration.java:160)
at 
org.apache.slider.client.SliderClient.deleteZookeeperNode(SliderClient.java:460)
at 
org.apache.slider.client.SliderClient.actionDestroy(SliderClient.java:550)
at org.apache.slider.client.SliderClient.exec(SliderClient.java:383)
at 
org.apache.slider.client.SliderClient.runService(SliderClient.java:348)
at 
org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
at 
org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475)
at 
org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403)
at 
org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630)
at org.apache.slider.Slider.main(Slider.java:49)

Any ideas on why that may occur?  My cluster is running on a set of VMs on my 
development box.  These failed ZK interactions will subsequently yield issues 
in trying to recreate the given application (in this case HBase)

— Jon

Re: Slider-develop - Build # 543 - Failure

2015-02-03 Thread Jon Maron

Looks like a failure from my commit. Will diagnose and fix. 


 On Feb 3, 2015, at 12:42 PM, Apache Jenkins Server 
 jenk...@builds.apache.org wrote:
 
 The Apache Jenkins build system has built Slider-develop (build #543)
 
 Status: Failure
 
 Check console output at https://builds.apache.org/job/Slider-develop/543/ to 
 view the results.

Re: Google Summer of Code 2015 is coming

2015-02-03 Thread Jon Maron

* Full app package tutorial
* native support for scripting langs other than python (agent side)

On Feb 3, 2015, at 5:06 PM, Sumit Mohanty sumit.moha...@gmail.com wrote:

 Some that come to my mind are:
 
 * Time line server, integration
 * Service Registry browser and API in Scala (e.g. some language other than
 java)
 
 -Sumit
 
 On Tue, Feb 3, 2015 at 1:45 PM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 Maybe we should start with some discussion on the list.
 
 What do we think we could get someone with no knowledge of the project do
 in 3 months. Assuming Java coding skills, but probably not OSS  test-first
 processes?
 
 What would we like them to do?
 
 -steve
 
 
 
 On 3 February 2015 at 20:41:25, Sumit Mohanty (smoha...@hortonworks.com
 mailto:smoha...@hortonworks.com) wrote:
 
 I think we should. How do we gather project ideas?
 Should we create JIRAs and then select one or more from the list and then
 add the gsoc2015 labels to the selected ones.
 
 -Sumit
 
 From: Steve Loughran ste...@hortonworks.com
 Sent: Tuesday, February 03, 2015 7:51 AM
 To: Slider Dev
 Subject: Fw: Google Summer of Code 2015 is coming
 
 Do we want to do some GSoC mentoring in the summer?
 
 This is a great chance to get something self-contained done.
 
 Of course, it must (a) be something interesting and (b) have the guarantee
 of support from the rest of the team. The goal is to get someone used to
 developing OSS projects, as part of a team: not coding on their own fixing
 random JIRAs.
 
 
 
 
 On 2 February 2015 at 22:46:09, Ulrich Stärk (u...@apache.orgmailto:
 u...@apache.org) wrote:
 
 Hello PMCs (incubator Mentors, please forward this email to your podlings),
 
 Google Summer of Code [1] is a program sponsored by Google allowing
 students to spend their summer
 working on open source software. Students will receive stipends for
 developing open source software
 full-time for three months. Projects will provide mentoring and project
 ideas, and in return have
 the chance to get new code developed and - most importantly - to identify
 and bring in new committers.
 
 The ASF will apply as a participating organization meaning individual
 projects don't have to apply
 separately.
 
 If you want to participate with your project we ask you to do the
 following things by no later than
 2015-02-13 19:00 UTC (applications from organizations close a week later)
 
 1. understand what it means to be a mentor [2].
 
 2. record your project ideas.
 
 Just create issues in JIRA, label them with gsoc2015, and they will show
 up at [3]. Please be as
 specific as possible when describing your idea. Include the programming
 language, the tools and
 skills required, but try not to scare potential students away. They are
 supposed to learn what's
 required before the program starts.
 
 Use labels, e.g. for the programming language (java, c, c++, erlang,
 python, brainfuck, ...) or
 technology area (cloud, xml, web, foo, bar, ...) and record them at [5].
 
 Please use the COMDEV JIRA project for recording your ideas if your
 project doesn't use JIRA (e.g.
 httpd, ooo). Contact d...@community.apache.org if you need assistance.
 
 [4] contains some additional information (will be updated for 2015
 shortly).
 
 3. subscribe to ment...@community.apache.org; restricted to potential
 mentors, meant to be used as a
 private list - general discussions on the public d...@community.apache.org
 list as much as possible
 please). Use a recognized address when subscribing (@apache.org or one of
 your alias addresses on
 record).
 
 Note that the ASF isn't accepted as a participating organization yet,
 nevertheless you *have to*
 start recording your ideas now or we might not get accepted.
 
 Over the years we were able to complete hundreds of projects successfully.
 Some of our prior
 students are active contributors now! Let's make this year a success again!
 
 Cheers,
 
 Uli
 
 P.S.: Except for the private parts (label spreadsheet mostly), this email
 is free to be shared
 publicly if you want to.
 
 [1] http://www.google-melange.com/gsoc/homepage/google/gsoc2015
 [2] http://community.apache.org/guide-to-being-a-mentor.html
 [3] http://s.apache.org/gsoc2015ideas
 [4] http://community.apache.org/gsoc.html
 [5] http://s.apache.org/gsoclabels
 
 
 
 
 -- 
 thanks
 Sumit

Re: draft proposal: split up email list

2015-02-02 Thread Jon Maron


On Feb 2, 2015, at 7:38 AM, Steve Loughran ste...@hortonworks.com wrote:

 following on from the discussion, I'd like to outline the proposal to bring a 
 vote on two topics. I'm not sure what level of contribution defines 
 bindingness (committer?)
 
 
 Proposal #1:  create 
 notificati...@slider.incubator.apache.orgmailto:notificati...@slider.incubator.apache.org
  for notifications.
 Machine-generated emails would go here:
 
  1.  JIRA edit/update notifications,
  2.  git/svn commits
 
 Issue 1a? Jenkins: do we want jenkins broke/fixed emails to still go to dev@? 
 Especially on the state change events? That is, if jenkins is still broken, 
 don't send updates, but if someone dies just break it, let everyone know.

+1

 
 Issue 1b? JIRA completions: the hadoop dev lists are set up so that when 
 JIRAs are completed they are announced on the dev@ list. This means that even 
 if you don't care about ongoing work, you do get to keep an eye on what has 
 just finished.
 

It probably makes sense to continue that practice.

 
 Proposal #2: create 
 u...@slider.incubator.apache.orgmailto:user@slider.incubator.apacheorg for 
 general usage emails
 
 
 I think we could do a vote on this while still resolving/tweaking issues 1a 
 and 1b; that's a matter of tuning the email settings in those builds  JIRA 
 projects until we are happy.
 
 -Steve

Re: [VOTE] Apache Slider Incubating Release 0.61.0-incubating

2015-01-30 Thread Jon Maron

Perhaps someone better versed with maven can attempt this:

I’m just trying to come up with the correct set of parameters to pull this 
version from the repository as a dependency (just to test the ability to 
reference it as a project dependency):

 mvn org.apache.maven.plugins:maven-dependency-plugin:2.1:get 
-DrepoUrl=https://repository.apache.org/content/repositories/orgapacheslider-1003/
 -Dartifact=org.apache.slider:slider-asembly:0.61.0-incubating

I can’t seem to come up with an group Id (org.apache.slider), artifact ID (I’ve 
tried slider or slider-assembly), and version (I’ve tried 0.61-incubating, 
0.61-incubating-all, etc) that will successfully identify and install this as a 
dependency.

— Jon

On Jan 28, 2015, at 10:09 AM, Steve Loughran 
ste...@hortonworks.commailto:ste...@hortonworks.com wrote:

This is a call for a vote on Apache Slider 0.61.0-incubating release

This is a source+binary release.

This release extends the previous RC with:
-full transient license check metadata
-use of -incubating in the maven versions

Artifacts at
https://repository.apache.org/content/repositories/orgapacheslider-1003/

Git source tag:
https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=64a8bac068e6801748fb973dbfb590bc62c60935


PGP keys at
http://pgp.mit.edu:11371/pks/lookup?op=vindexsearch=ste...@apache.org

Build instructions at:
http://slider.incubator.apache.org/developing/building.html


Vote will be open for 72 hours

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)


Here's my vote: +1 (binding)

-Steve

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

Re: [VOTE] Apache Slider Incubating Release 0.61.0-incubating

2015-01-30 Thread Jon Maron

+1

On Jan 28, 2015, at 10:09 AM, Steve Loughran ste...@hortonworks.com wrote:

 This is a call for a vote on Apache Slider 0.61.0-incubating release
 
 This is a source+binary release.
 
 This release extends the previous RC with:
 -full transient license check metadata
 -use of -incubating in the maven versions
 
 Artifacts at
 https://repository.apache.org/content/repositories/orgapacheslider-1003/
 
 Git source tag:
 https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=commit;h=64a8bac068e6801748fb973dbfb590bc62c60935
 
 
 PGP keys at
 http://pgp.mit.edu:11371/pks/lookup?op=vindexsearch=ste...@apache.org
 
 Build instructions at:
 http://slider.incubator.apache.org/developing/building.html
 
 
 Vote will be open for 72 hours
 
 [ ] +1 approve
 [ ] +0 no opinion
 [ ] -1 disapprove (and reason why)
 
 
 Here's my vote: +1 (binding)
 
 -Steve
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.

Re: [VOTE] Apache Slider Incubating Release 0.61.0-incubating

2015-01-30 Thread Jon Maron

That really is the one we need working, so I imagine the issue I note below is 
actually a non-issue…

On Jan 30, 2015, at 2:24 PM, Sumit Mohanty smoha...@hortonworks.com wrote:

 Modified Ambari code to take dependency on the specific mvn repo path to
 validate.
 
 POM - snippet
 
 repositories
  repository
idASF Snapshots/id
 
 urlhttp://cp.mcafee.com/d/2DRPow96Qm4QT67PhOOedTdETsuK-MOOMrhKUqem76kkkPqdT7HLIcILCQrK6zBxVUsY-r4ltzm0a9RzeI35oBqsKr9RzeI35oBqsKrKgR0xpx_HYyehd7b0XHTbFILTVBAQsCzCXfbnhIyyGyyDOEuvkzaT0QSyr76XYCONtxdZZBcTsSjDdqymp7RGQBeIiTqJJyvY01ORoj-4YzWRqiDmcLFw07lJIj-xYAfynhLsSxVAL7VJNwnsoh75N5noRwsjH6to6aNaQVs5kfAxYuB2Pp3pRzSHroD_00jrOoVAS2_id41Fr38o18Qqnjh08broDVEw63V8v4Qg1IbdAdDa14Qg0LP_SDCy2_e8ZsLa5CO6PB0yrshdz8-5
 3//url
  /repository
 
 
 Verified with the build ..
 Š
 Downloading: 
 http://cp.mcafee.com/d/avndzgQd6Qm4QT67PhOOedTdETsuK-MOOMrhKUqem76kkkPqdT7HLIcILCQrK6zBxVUsY-r4ltzm0a9RzeI35oBqsKr9RzeI35oBqsKrKgR0xpx_HYyehd7b0XHTbFILTVBAQsCzCXfbnhIyyGyyDOEuvkzaT0QSCr76XYCONtxdZZBcTsSjDdqymp7RGQBeIiTqJJyvY01ORoj-4YzWRqiDmcLFw07lJIj-xYAfynhLtD009J3P9ufPrz0KUMyebyaKNH0UDmcWMclylFOUaEv93UZa5CO6PH7JmSNf-00CTANP9I5-Aq83iS6gM2hEQKCy0gmSNfPh0c7Og-9Ew3omr8rek29Ew1vD_Jfd45-shWVukbdAdDa14SUyrwKLp
 /apache/slider/slider-core/0.61.0-incubating/slider-core-0.61.0-incubating.
 jar
 
 Š
 
 
 Let me verify the other aspects prior to voting.
 
 On 1/30/15, 8:42 AM, Jon Maron jma...@hortonworks.com wrote:
 
 Perhaps someone better versed with maven can attempt this:
 
 I¹m just trying to come up with the correct set of parameters to pull
 this version from the repository as a dependency (just to test the
 ability to reference it as a project dependency):
 
 mvn org.apache.maven.plugins:maven-dependency-plugin:2.1:get
 -DrepoUrl=http://cp.mcafee.com/d/2DRPoA93gOrhojjspshuKrKrhKUZtZxBBwSztMQsI
 ecEEFCQrKfnvoppvdETsd7bWbaqoUThmFfojb0EDmcWMclylFOVIDmcWMclylFOVJAQsI425C7
 -LPwUQsFI8ZuVteX0XAkTbEEKsJteOaaJT6ul3PWApmU6CQjrVK_cKfc9ETvjsdTdAVPmEBChZ
 qJ9jH4JSHroD_00sJm4_xf8-JmAFRzbWo01Rrr4_Ev93UBQrTpMSxVAL7VJNwnsohobQlGjS4O
 NNeIpRwoH4HjBMlg-i7NWkbdAdDmfqJJyvY01dKnd7b1I5-Aq83iS6gM2hEQKCy0gmSNfPh0c7
 Og-9Ew3omr8rek29Ew1vD_Jfd45-shWVukbdAdDa14SUyrWZxv8k9e1Hi
 -Dartifact=org.apache.slider:slider-asembly:0.61.0-incubating
 
 I can¹t seem to come up with an group Id (org.apache.slider), artifact ID
 (I¹ve tried slider or slider-assembly), and version (I¹ve tried
 0.61-incubating, 0.61-incubating-all, etc) that will successfully
 identify and install this as a dependency.
 
 ‹ Jon
 
 On Jan 28, 2015, at 10:09 AM, Steve Loughran
 ste...@hortonworks.commailto:ste...@hortonworks.com wrote:
 
 This is a call for a vote on Apache Slider 0.61.0-incubating release
 
 This is a source+binary release.
 
 This release extends the previous RC with:
 -full transient license check metadata
 -use of -incubating in the maven versions
 
 Artifacts at
 http://cp.mcafee.com/d/avndxMs739J5xddNBN5WVKVJ6XzRTS6mm3qdT3hOMUOyyCrhKUZ
 tZxBBYSztMQsLEIFFzzt5qAZxcI2ytoPH0Nm9mDbCOtoPH0Nm9mDbCSjhOMg8movW_e3zhOCMz
 RXBQXI3KhjsKyyVORQX8EGTspVkffGhBrwqrjdLCXYOUYMCztZdMTsSjDdqymp7RGQBeIiTqJJ
 yvY01ORoj-4YzWRqiDmcLFw07lJIj-xYAfynhLtD3q7CiYvCT61tNx5wLhmFfojb74WNDm1yIi
 Jen1l3V8v7FgISgStoZGSS9_M04SVsQsI6MnWhEwdbop3096ziWq811rr4_d40Mv93UCy0dxpI
 xIVg8Cy05-v-QYQgnVN7HBVgISgSsE4jry9I6A-cvj
 
 Git source tag:
 http://cp.mcafee.com/d/FZsS73hJ5xddNBN5WVKVJ6XzRTS6mm3qdT3hOMUOyyCrhKUZtZx
 BBYSztMQsLEIFFzzt5qAZxcI2ytoPH0Nm9mDbCOtoPH0Nm9mDbCSjhOMg8movW_e3zhOCMzRXB
 QXI3KhjsKyyVORQX8EGTspVkffGhBrwqrpdLCXYOUYMCztZdMTsSjDdqymohBcKoAInrFSHroD
 _00szWRqpJ7iiNQEmr8reIiZ3V8v7ycFwzjBRpj11cVNzny6SKUZXzRQhP3PF0QszCk23F1YQs
 T8msjIVsQsILCQfcBU_dKc2Xz2b1uyJiuMCme9RzeI35oBqsK2G7Og-fixpIxIWNXlJIj_w09J
 OVEVodwLQzh0qmMO60id6BQQg22SS9-q81w-i7Nd40r2Pp3pOwhd40bY_ZFVEwLPyfnbOxpIxI
 Vg8CT4jqkEQ
 
 
 PGP keys at
 http://cp.mcafee.com/d/k-Kr6jqb2qrzbybRPtPqdT7HLIcII6QrK6zBxNB55cSztNWXX3b
 bVJ6XxEVvhpjj76WaR9X2po54WNDm1yIiJendAWNDm1yIiJendICzBwwgIM_R-s76zBdx7HTbF
 To7syCVt55PBHFShhlKUPOEuvkzaT0QSMrvdTVBNVxd6XWrxKVI06IhmVmk-Z1J955AszyNmC7
 RGlrm9lmecwmRjZzoCtivCvf1mHroD_00jq7CiYvCT61tNx5wLhmFfojb74WNDm1yIiJen1l3V
 8v7FgISgStoZGSS9_M04SVsQsI6MnWhEwdbop3096ziWq811rr4_d40Mv93UCy0dxpIxIVg8Cy
 05-v-QYQgnVN7HBVgISgSsE4jry9K9WtYopJtQFU
 
 Build instructions at:
 http://cp.mcafee.com/d/FZsSd1MOrhojjspshuKrKrhKUZtZxBBwSztMQsIecEEFCQrKfnv
 oppvdETsd7bWbaqoUThmFfojb0EDmcWMclylFOVIDmcWMclylFOVJAQsI425C7-LPwUQsFI8Zu
 VteX0XAkTbEEKsJteOaaJT6ul3PWApmU6CTPrVK_cKfc9ETvjsdTdw0W7Og-fixpIxIWNXlJIj
 _w0e8upY-GQE4s8dOfgB0zMedEupbN-rso5T64m2Z5qAZxcIsjH6to6aNaQVs5kfAxYuB2Pp3p
 RzSHroD_00jrBPhOMr1vF6y0QJxAc0AqdbFEw45JIjYQg31YAfyq80S5CO6PB0yq80nV_XjPh1
 vD4uKnB2Pp3pOwhdK8CM8y07Hnp
 
 
 Vote will be open for 72 hours
 
 [ ] +1 approve
 [ ] +0 no opinion
 [ ] -1 disapprove (and reason why)
 
 
 Here's my vote: +1 (binding)
 
 -Steve
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential

Re: review board

2015-01-30 Thread Jon Maron

Great!

I plan to start submitting my commits for review - I’ll figure out a process
that works for me (rbt, web posting, etc) and post my thoughts. One thing that
will probably be important is managing the scope of commits - reviews should be
kept to a manageable size (ideally around 5 files or so). That means that some
JIRAs will probably be best addressed by multiple incremental commits rather
than one large one (if portions can be broken into commits that do not break
the overall functionality).

— Jon

On Jan 30, 2015, at 9:18 AM, Ted Yu yuzhih...@gmail.com wrote:

The INFRA JIRA has been resolved.

We should be able to use reviewboard now.

On Jan 27, 2015, at 10:34 AM, Steve Loughran ste...@hortonworks.com wrote:

infra all hang out on hipchat

http://cp.mcafee.com/d/5fHCN0gdEI9FK9zDS7PtPqdT7HLIcII6QrK6zBxNB55cSztNWXX3bbVJ6XxEVouu7ffCN5noRw2ytoPH0Nm9mDbCOtoPH0Nm9mDbCPhOwCqei5C7-LPbzPVEVhWZOWrbOvefzD4XYJteOaaJQmul3PWApmU6CQjqpK_9TLuZXTLsTsSjDdqymokUD9qNfpYLx50gA6S1xoqdEupbN-rso5T64kiUyHIqMe9RzeI35oBqsK2G7Og-fixpIxIWNXlJIj_w09JVZV4S2_id44vcOYMq825tywrhsdFzYp2BUD

On 27 January 2015 at 16:34, Jon Maron jma...@hortonworks.com wrote:

Ted made that change early in January, but the issue has yet even get
assigned. Is there any mechanism available to get this issue some
attention? Can we escalate it?

— Jon

On Jan 5, 2015, at 10:14 AM, Billie Rinaldi billie.rina...@gmail.com
wrote:

It might help (a little) if the component were listed as ReviewBoard. I
think only Ted can change this.

On Mon, Jan 5, 2015 at 7:06 AM, Jon Maron jma...@hortonworks.com
wrote:

Is there a way to poke one of these to get it moving? :)

— Jon

On Dec 21, 2014, at 11:41 PM, Ted Yu yuzhih...@gmail.com wrote:

Logged INFRA-8927

Cheers

On Sun, Dec 21, 2014 at 4:37 PM, Sumit Mohanty
sumit.moha...@gmail.com
wrote:

You will need an INFRA jira to create a slider review group and also
link
the git repo for slider. I had created one INFRA jira for Ambari a
while
back for the same. See if that can be a reference.

On Monday, December 22, 2014, Jon Maron jma...@hortonworks.com
wrote:

I’ve wondered about that myself. I’m wondering if anything is really
required administratively, or can we simply install RBTools and start
posting reviews?

On Dec 21, 2014, at 4:28 PM, Ted Yu yuzhih...@gmail.com
javascript:;
wrote:

Hi,
Who should I contact so that patch for Slider can be added to
http://cp.mcafee.com/d/avndzgAd6Qm4QT4NPX3VKVJ6XzRTS6mm3qdT3hOMUOyyCrhKUZtZxBBYSztMQsIff3DDPoyHIqM1heIpRwoH4HjBPpeIpRwoH4HjBPpEVgjd792P3_nVBNVYQsEZuVtdBVfD7NPyt-mKDp55mWbfaxVZicHs3jqpJcTvAXTLuZXTKrKr9PCJhbczVCnCatGSS9_M078V3P66QfcBU_dKc2Xz2a9shlSdo74WNDm1yIiJen1l3V8v7FgISgStoZGSS9_M04SY-Yyr1vF6y2fCpuod412KNgdEK6OBw7iKgXPf44
?

Cheers

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
entity
to
which it is addressed and may contain information that is
confidential,
privileged and exempt from disclosure under applicable law. If the
reader
of this message is not the intended recipient, you are hereby
notified
that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately
and delete it from your system. Thank You.

--
thanks
Sumit

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the
reader
of this message is not the intended recipient, you are hereby notified
that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately
and delete it from your system. Thank You.

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

Re: updating the release process; what to do about branches

2015-01-29 Thread Jon Maron

I would favor the current mapping - a little more intuitive IMHO

On Jan 29, 2015, at 5:40 AM, Steve Loughran ste...@hortonworks.com wrote:

 
 (apologies for any formatting confusion here; I am migrating to Outlook web 
 apps)
 
 
 I quite like the git flow bit of the sourcetree GUI; it looks like we can 
 change the policy here for it to make the master branch the one git flow 
 branches from/merges with, master could be release -and ignored if we 
 don't use that aspect of the git flow process
 
 [gitflow branch]
   master = master
   develop = develop
 
 
 From: Josh Elser josh.el...@gmail.com
 Sent: 28 January 2015 18:02
 To: dev@slider.incubator.apache.org
 Subject: Re: updating the release process; what to do about branches
 
 Could also just remove master in its current use and
 s/develop/master/, leveraging the master branch as the normal place
 things are implemented.
 
 It really doesn't matter in the end (it's just a name), but, if this is
 also signifying a move away from git-flow, it makes more sense to me to
 use master instead of develop.
 
 
 Sumit Mohanty wrote:
 I vote for removing the master branch. This is in the line of what I was
 also wondering since we have created branches for 0.60 and 0.70. Branches
 can remain the source of truth for the release and can facilitate minor
 releases if needed.
 
 On Wed, Jan 28, 2015 at 8:58 AM, Steve Loughranste...@hortonworks.com
 wrote:
 
 The latest release process document is now in svn at
 site/trunk/content/developing/releasing.md
 
 It hasn't yet propagated to the HTML view, when it does it will be at
 
 http://slider.incubator.apache.org/developing/releasing.html
 
 I think we've outgrown the git flow release process.
 
 The feature branch seems to work well, but the release process has
 everything merged into the branch master,
 
- It doesn't handle long-lived release/supported branches
- Merging into master/ can create convoluted dependency graphs,
resulting a commit graph (and hence git commit ID) which is different
 from
what is released.
 
 What are we to do?
 
 I'm wondering if we should get rid of that master/ branch altogether.
 
 Instead we could have some tags which we could move around:
 
- last_branch_6_stable_release
- last_branch_6_dev_release
- last_branch_7_stable_release
- last_branch_7_dev_release
- last_stable_release
- last_dev_release
 
 If you fetch all tags then check out by tag, you end with whatever version
 we think is last on a branch; the stable/dev releases can even cross
 branches as something migrates from development to stable
 
 During the release process, instead of doing git merge master work, we'd
 just delete some tags, create the new ones and then push them to the
 origin.
 
 Thoughts?
 
 -steve
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

Re: [DISCUSS]: mailing list rework

2015-01-29 Thread Jon Maron


On Jan 29, 2015, at 1:35 PM, Steve Loughran ste...@hortonworks.com wrote:

 A migration to outlook web access today has brought home how much 
 machine-generated noise there is in this, primarily JIRA notifying of commits 
 and changes.. Gmail had been hiding a lot of this for me.
 
 
 I think this noise is getting in the way of any conversations in the list 
 itself.
 

+1

 
 I'm going to propose
 
  1.  ?we create a new list purely for the always-sent JIRA notifications
  2.  the -dev list still gets the creation and completion messages, but 
 nothing else
  3.  people who care about an issue and don't want to subscribe to the new 
 mailing list can just hit the Watch button.
 
 I don't have any ideas or preference for a list name

-build?

 
 Alongside this: do we want a separate users list?
 

I don’t think it’s necessary at this point.

 Thoughts?
 
 -steve

Re: review board

2015-01-27 Thread Jon Maron

Thanks.  Best suggestion they had was to add a comment and trigger an email 
(they weren’t certain of the responsible party).  We’ll see if that works.

— Jon

On Jan 27, 2015, at 1:34 PM, Steve Loughran ste...@hortonworks.com wrote:

 infra all hang out on hipchat
 
 https://www.hipchat.com/gdAiIcNyE
 
 On 27 January 2015 at 16:34, Jon Maron jma...@hortonworks.com wrote:
 
 Ted made that change early in January, but the issue has yet even get
 assigned.  Is there any mechanism available to get this issue some
 attention?  Can we escalate it?
 
 — Jon
 
 On Jan 5, 2015, at 10:14 AM, Billie Rinaldi billie.rina...@gmail.com
 wrote:
 
 It might help (a little) if the component were listed as ReviewBoard.  I
 think only Ted can change this.
 
 On Mon, Jan 5, 2015 at 7:06 AM, Jon Maron jma...@hortonworks.com
 wrote:
 
 Is there a way to poke one of these to get it moving?  :)
 
 — Jon
 
 On Dec 21, 2014, at 11:41 PM, Ted Yu yuzhih...@gmail.com wrote:
 
 Logged INFRA-8927
 
 Cheers
 
 On Sun, Dec 21, 2014 at 4:37 PM, Sumit Mohanty 
 sumit.moha...@gmail.com
 wrote:
 
 You will need an INFRA jira to create a slider review group and also
 link
 the git repo for slider. I had created one INFRA jira for Ambari a
 while
 back for the same. See if that can be a reference.
 
 On Monday, December 22, 2014, Jon Maron jma...@hortonworks.com
 wrote:
 
 I’ve wondered about that myself.  I’m wondering if anything is really
 required administratively, or can we simply install RBTools and start
 posting reviews?
 
 On Dec 21, 2014, at 4:28 PM, Ted Yu yuzhih...@gmail.com
 javascript:;
 wrote:
 
 Hi,
 Who should I contact so that patch for Slider can be added to
 https://reviews.apache.org/r/new/ ?
 
 Cheers
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity
 to
 which it is addressed and may contain information that is
 confidential,
 privileged and exempt from disclosure under applicable law. If the
 reader
 of this message is not the intended recipient, you are hereby
 notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 
 
 --
 thanks
 Sumit
 
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the
 reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Deploying Accumulo with Slider View Ambari 1.7 Question

2015-01-27 Thread Jon Maron

You can try to obtain the set of logs associated with the application using the
“yarn” command:

yarn logs -applicationId application ID

— Jon

On Jan 27, 2015, at 9:33 AM, Billie Rinaldi billie.rina...@gmail.com wrote:

Moving this to the Slider dev list. I am not seeing the cause of the error
in the AM log. The Accumulo master process keeps failing to start, so there
may be a configuration problem. Would you be able to track down logs for one
of the failed Accumulo master containers to see what they say?

On Tue, Jan 27, 2015 at 6:08 AM, Avellanet, Tatiana (HP Technology Center)
tatiana.avella...@hp.com wrote:
Good morning:

My name is Tatiana Avellanet and I worked for Hewlett-Packard Technology
Center in Puerto Rico. I have been trying to deploy Accumulo using Slider
View in Ambari 1.7 unsuccessfully for some time. I am using package in
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/GA/2.2.0.0/slider-app-packages/accumulo/slider-accumulo-app-package-1.6.1.2.2.0.0-2041.zip.
I create the /user/yarn folder in hdfs and assigned rights to yarn user.
Then I proceed to the Slider View interface and select Create App, then
select ACCUMULO as the application type and used accumulo as the name with
all defaults configurations and then create the application:

The application goes from Accepted to Running and then after about 10 minute
it fails with error:

Unstable Application Instance : - failed with component ACCUMULO_MASTER
failing 6 times (0 in startup); threshold is 5 - last failure: Failure
container_1422305155071_0001_01_14 on host ambariServer: http://
ambariServer:19888/jobhistory/logs/
ambariServer:45454/container_1422305155071_0001_01_14/ctx/yarn

Attached is the slider-out.txt log from Yarn. I can successfully deploy storm
using this same environment but Accumulo keeps failing. This is a small
cluster, with only 3 nodes (all of them are NodeManagers and Slider Clients).

I can’t figure out what is wrong. Can you help me with this?

Thanks,

Tatiana Avellanet

Software Designer V

ESSN SC PuertoRico - Hewlett-Packard Company

Tel. : 1 787 658 4039

email : tatiana.avella...@hp.com

Hwy 110 North KM 5.1, Bld 1 Aguadilla, PR 00603

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

Re: review board

2015-01-27 Thread Jon Maron

Ted made that change early in January, but the issue has yet even get assigned. 
 Is there any mechanism available to get this issue some attention?  Can we 
escalate it?

— Jon

On Jan 5, 2015, at 10:14 AM, Billie Rinaldi billie.rina...@gmail.com wrote:

 It might help (a little) if the component were listed as ReviewBoard.  I
 think only Ted can change this.
 
 On Mon, Jan 5, 2015 at 7:06 AM, Jon Maron jma...@hortonworks.com wrote:
 
 Is there a way to poke one of these to get it moving?  :)
 
 — Jon
 
 On Dec 21, 2014, at 11:41 PM, Ted Yu yuzhih...@gmail.com wrote:
 
 Logged INFRA-8927
 
 Cheers
 
 On Sun, Dec 21, 2014 at 4:37 PM, Sumit Mohanty sumit.moha...@gmail.com
 wrote:
 
 You will need an INFRA jira to create a slider review group and also
 link
 the git repo for slider. I had created one INFRA jira for Ambari a while
 back for the same. See if that can be a reference.
 
 On Monday, December 22, 2014, Jon Maron jma...@hortonworks.com wrote:
 
 I’ve wondered about that myself.  I’m wondering if anything is really
 required administratively, or can we simply install RBTools and start
 posting reviews?
 
 On Dec 21, 2014, at 4:28 PM, Ted Yu yuzhih...@gmail.com
 javascript:;
 wrote:
 
 Hi,
 Who should I contact so that patch for Slider can be added to
 https://reviews.apache.org/r/new/ ?
 
 Cheers
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the
 reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 
 
 --
 thanks
 Sumit
 
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Locality results in instance shut-down due to single bad instance

2015-01-08 Thread Jon Maron

+1.  A good way to provide the functionality while leveraging existing 
mechanisms

On Jan 8, 2015, at 8:46 AM, Gour Saha gs...@hortonworks.com wrote:

 +1 on that
 
 That's also what I meant when I said - 
 I don't think we have a logic where we apply data locality and then upon a
 certain no of failures (threshold) try with no data locality at least
 once before giving up. It will be a good idea to file a JIRA with this
 requirement.
 
 -Gour
 
 - Sent from my iPhone
 
 On Jan 8, 2015, at 3:30 AM, Steve Loughran ste...@hortonworks.com wrote:
 
 thinking about this some more, we could use our tracking of node
 reliability to tune our placement decisions.
 
 
  1. We add a recent failures field to the node entries, alongside the
  total failures
  2. Our scheduled failure count resetter will set that field to zero,
  alongside the component failures
  3. When Slider has to request a new container, unless the placement
  policy is STRICT, we will continue to use the (persisted) placement history
  4. Except now, if a node has a recent failure count above some
  threshold, we don't ask for a container on that node...we just ask for
  anywhere placement.
 
 What do people think?
 
 On 7 January 2015 at 09:50, Steve Loughran ste...@hortonworks.com wrote:
 
 the history of where things were is retained in the RoleHistory
 structures, persisted to HDFS and reread on startup. for each component
 type, it's sorted by most-recent-first.
 
 When a container is needed, the AM looks in that history first, and looks
 through the list of previously used nodes for that component type.,
 skipping any that already have an instance of that component running. The
 chosen node is taken off the list, so there's no duplicates
 (exception: the component type doesn't have any locality, in which case
 although the history is tracked, it's not used for placement)
 
 
 
 When a placement on the node comes in, then its taken off the pending
 list
 
 There's one small issue here: no way to tie requests to allocations. We
 don't really care which request allocates a component to a node, we just
 like to track outstanding requests for explicit nodes. The algorithm is
 -allocation to a requested node: remove node from list of outstanding
 explicit requests
 -allocation to another node: do nothing while there are outstanding
 requests
 -all outstanding requests satisfied: clean the list of outstanding
 placed requests.
 
 Now, fun happens when a container fails on a newly allocated node —and its
 here there may be some policy tuning required.
 
 It comes down to this: what is the best way to react when a component
 fails to start, either immediately, or shortly after startup? This can be a
 sign of a major problem node doesn't run my app, or something transient
 port still considered in use
 
 If its a transient problem, there's no harm in asking again.
 
 If its a permanent problem: we need to make the decision that this node is
 bad —at least for that specific component.
 
 I think right now, on a startup/launch time failure, the failing node is
 placed at the back of the list of recently used nodes; the failure counts
 of both the node and the component incremented. Although there's a YARN API
 where an application can provide blacklist hints to YARN, we're not
 currently using it.
 
 I think what you may be seeing is that Slider is repeatedly asking for the
 same node: it's failing and going to the back of the list of previously
 used nodes, but at there is only one, it's being asked for again.
 
 We can tune this -maybe- but it gets complex.
 
 1. If the placement policy is STRICT, then we must ask for that previously
 used node. (Though thinking about it, the component must have started at
 least once at some point in the past...I don't know if the special case of
 previously allocated but never started is detected and handled)
 
 2. If the placement is location-preferred, default, how best to react to a
 launch failure? Completely cut that node off the list of suitable targets?
 Or try again a few more times? If its a transient problem, retry gives
 locality without over-reacting. If its a permanent problem, then retrying
 is the wrong policy.
 
 What should we do here? We are tracking failures in NodeEntry entries, in
 a map of the cluster built up (NodeMap), but not currently using failure
 counts there to make decisions. If we do think about using it, we'll have
 to think about not just keeping the count of failures, but resetting it on
 an interval, the way we now do with component failure counts.
 
 -steve
 
 
 
 
 
 On 7 January 2015 at 02:50, Gour Saha gs...@hortonworks.com wrote:
 
 Nitin,
 
 I don't think we have a logic where we apply data locality and then upon a
 certain no of failures (threshold) try with no data locality at least
 once before giving up. It will be a good idea to file a JIRA with this
 requirement.
 
 -Gour
 
 
 On Tue, Jan 6, 2015 at 5:12 PM, Nitin Aggarwal 
 nitin3588.aggar...@gmail.com
 wrote:
 
 I

Re: Locality results in instance shut-down due to single bad instance

2015-01-08 Thread Jon Maron


On Jan 8, 2015, at 1:58 PM, Steve Loughran ste...@hortonworks.com wrote:

 On 8 January 2015 at 18:15, Nitin Aggarwal nitin3588.aggar...@gmail.com
 wrote:
 
 +1. Also, should the node with failure threshold reached, be ever
 considered in future for allocation ?
 It is possible that due to some temporary issues with few nodes, locality
 node set, could have a lot more nodes than needed.
 
 
 There's various options here, whose policy depends on cluster size.
 
 a key one that's surfaced in MapReduce a long, long time ago: *if you only
 have one node, don't blacklist it.*
 
 in fact, for any small cluster you don't want to think about blacklisting
 so much as sorting them by reliability, and preferring those you consider
 more reliable. But, that's complicated. State of the art here is probably
 The φ Accrual Failure Detector:   http://ddg.jaist.ac.jp/pub/HDY+04.pdf,
 
 I think what we can do at it simplest is just reset the counter every day.
 Even when we don't trust it enough to explicitly ask for containers on it,
 we could just say anywhere and have it picked up sometimes, especially on
 smaller clusters where there are less servers to choose from.
 
 There is also a blacklist API in YARN, which we've not looked at yet. It
 lets us tell YARN which nodes to blacklist, which to whitelist. We'd have
 to track which ones were blacklisted and whitelist them after a while,
 unless we want to end up blacklisting the entire cluster. so again: some
 kind of reset operation, though if we track when a node was blacklisted, we
 could give it a specific time interval (say 24h) to get better.
 
 
 All this failure tracking history is lost on an AM failure incidentally,
 possibly including the list of blacklisted nodes handed off to YARN.

Can YARN provide that list when requested?

 
 
 
 On Thu, Jan 8, 2015 at 9:19 AM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 https://issues.apache.org/jira/browse/SLIDER-743 it is then.
 
 On 8 January 2015 at 14:26, Jon Maron jma...@hortonworks.com wrote:
 
 +1.  A good way to provide the functionality while leveraging existing
 mechanisms
 
 On Jan 8, 2015, at 8:46 AM, Gour Saha gs...@hortonworks.com wrote:
 
 +1 on that
 
 That's also what I meant when I said -
 I don't think we have a logic where we apply data locality and then
 upon a
 certain no of failures (threshold) try with no data locality at
 least
 once before giving up. It will be a good idea to file a JIRA with
 this
 requirement.
 
 -Gour
 
 - Sent from my iPhone
 
 On Jan 8, 2015, at 3:30 AM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 thinking about this some more, we could use our tracking of node
 reliability to tune our placement decisions.
 
 
 1. We add a recent failures field to the node entries, alongside
 the
 total failures
 2. Our scheduled failure count resetter will set that field to
 zero,
 alongside the component failures
 3. When Slider has to request a new container, unless the placement
 policy is STRICT, we will continue to use the (persisted) placement
 history
 4. Except now, if a node has a recent failure count above some
 threshold, we don't ask for a container on that node...we just ask
 for
 anywhere placement.
 
 What do people think?
 
 On 7 January 2015 at 09:50, Steve Loughran ste...@hortonworks.com
 
 wrote:
 
 the history of where things were is retained in the RoleHistory
 structures, persisted to HDFS and reread on startup. for each
 component
 type, it's sorted by most-recent-first.
 
 When a container is needed, the AM looks in that history first, and
 looks
 through the list of previously used nodes for that component
 type.,
 skipping any that already have an instance of that component
 running.
 The
 chosen node is taken off the list, so there's no duplicates
 (exception: the component type doesn't have any locality, in which
 case
 although the history is tracked, it's not used for placement)
 
 
 
 When a placement on the node comes in, then its taken off the
 pending
 list
 
 There's one small issue here: no way to tie requests to
 allocations.
 We
 don't really care which request allocates a component to a node, we
 just
 like to track outstanding requests for explicit nodes. The
 algorithm
 is
 -allocation to a requested node: remove node from list of
 outstanding
 explicit requests
 -allocation to another node: do nothing while there are outstanding
 requests
 -all outstanding requests satisfied: clean the list of outstanding
 placed requests.
 
 Now, fun happens when a container fails on a newly allocated node
 —and
 its
 here there may be some policy tuning required.
 
 It comes down to this: what is the best way to react when a
 component
 fails to start, either immediately, or shortly after startup? This
 can
 be a
 sign of a major problem node doesn't run my app, or something
 transient
 port still considered in use
 
 If its a transient problem, there's no harm in asking again.
 
 If its a permanent problem: we need to make the decision that this
 node

Re: looking at a 0.62 release; 0.60 as published ASF binaries

2015-01-08 Thread Jon Maron

+1

On Jan 8, 2015, at 2:13 PM, Sumit Mohanty smoha...@hortonworks.com wrote:

 Sounds good. +1
 
 On Thu, Jan 8, 2015 at 11:03 AM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 The Apache Ambari need maven-central hosted Slider artifacts, which means
 that we need to release some.
 
 I'm hoping just to backport the maven build changes needed to enable this
 onto the 0.6x branch, so that we can give them which matches what they've
 been building against, slider-0.60.
 
 We can then do a vote through of it, followed by getting it approved by the
 incubator PMC.
 
 I think after that (say, 2-3 weeks later) we can do a 0.70 release of where
 we are now. That's bug fixes and other things done since 0.60.
 
 -steve
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: rm-ing the surplus slider-agent.tar.gz from the slider tar file

2015-01-06 Thread Jon Maron


On Jan 6, 2015, at 9:54 AM, Steve Loughran ste...@hortonworks.com wrote:

 https://issues.apache.org/jira/browse/SLIDER-641 covers the fact that we
 have two slider-agent.tar.gz files in the tar, one in /lib, the other in
 /agent.
 
 I want to cut one: which one should I keep? I'm assuming the agent one

+1

 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: review board

2015-01-05 Thread Jon Maron

Is there a way to poke one of these to get it moving?  :)

— Jon

On Dec 21, 2014, at 11:41 PM, Ted Yu yuzhih...@gmail.com wrote:

 Logged INFRA-8927
 
 Cheers
 
 On Sun, Dec 21, 2014 at 4:37 PM, Sumit Mohanty sumit.moha...@gmail.com
 wrote:
 
 You will need an INFRA jira to create a slider review group and also link
 the git repo for slider. I had created one INFRA jira for Ambari a while
 back for the same. See if that can be a reference.
 
 On Monday, December 22, 2014, Jon Maron jma...@hortonworks.com wrote:
 
 I’ve wondered about that myself.  I’m wondering if anything is really
 required administratively, or can we simply install RBTools and start
 posting reviews?
 
 On Dec 21, 2014, at 4:28 PM, Ted Yu yuzhih...@gmail.com javascript:;
 wrote:
 
 Hi,
 Who should I contact so that patch for Slider can be added to
 https://reviews.apache.org/r/new/ ?
 
 Cheers
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 
 
 --
 thanks
 Sumit
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: How to set jvm option for HMaster

2015-01-05 Thread Jon Maron

I’m not certain that would work for the HBase Master - it’s a Slider App Master 
setting.

There are some specific heap size settings for the HMaster:

site.global.hbase_master_heapsize: 1024m,

It looks like specific ways to tune the master launch command line options 
would require some script/configuration modifications?

— Jon


On Jan 5, 2015, at 10:32 AM, Ted Yu yuzhih...@gmail.com wrote:

 杨浩:
 See the following example appConfig.json snippet:
 
  components: {
HBASE_MASTER: {
},
slider-appmaster: {
  *jvm.heapsize*: 256M,
 
 Cheers
 
 On Mon, Jan 5, 2015 at 7:27 AM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 can you post your appconfig.json file, under your .slider/CLUSTERNAME dir?
 
 On 5 January 2015 at 09:38, 杨浩 yangha...@gmail.com wrote:
 
 As I have set jvm.opts , but it doesn't function
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: how to get get the port of memcached

2014-12-29 Thread Jon Maron

 the
 result by executing shell command in Java.
 
 2014-12-23 19:39 GMT+08:00 Jon Maron jma...@hortonworks.com:
 
 Are you suggesting that the client interact with the REST API to
 retrieve
 results (instead of the current rpc mechanism)?  That is part of the
 plan.
 
 On Dec 23, 2014, at 1:45 AM, 杨浩 yangha...@gmail.com wrote:
 
 I think a way to do so is that  exposing the REST API to get the
 result
 of
 slider shell command
 
 2014-12-23 14:22 GMT+08:00 Gour Saha gs...@hortonworks.com:
 
 Do you mean REST API?
 
 Significant work is going on in exposing REST API in slider for
 the
 next
 major release. We still don't know the best way to expose a REST
 API
 to
 retrieve the AM host:port (via YARN REST API maybe) as the REST
 endpoint
 itself will be served by the Slider AM host:port, but will surely
 come
 up
 with an elegant solution. Suggestions are welcome!!
 
 Check the uber jira for more details -
 https://issues.apache.org/jira/browse/SLIDER-151
 
 -Gour
 
 On Mon, Dec 22, 2014 at 1:50 AM, 杨浩 yangha...@gmail.com
 wrote:
 
 Hi ,I've get the am port through shell command slider list
 +applicationName+ --state RUNNING,but arguing with my boss,
 we
 think
 it's an ugly way to be used in production env.
 
 Can we get the am host:port through Java API
 
 2014-12-16 9:07 GMT+08:00 Gour Saha gs...@hortonworks.com:
 
 Once the app is up and running can you hit the following url
 and
 copy
 paste
 what you see?
 
 http://yang:8088/proxy/
 application_id/ws/v1/slider/publisher/slider
 
 where the application_id will be the value from the property
 *
 info.am.app.id
 http://info.am.app.id* in the status output above.
 
 -Gour
 
 On Thu, Dec 11, 2014 at 8:23 PM, 杨浩 yangha...@gmail.com
 wrote:
 
 yang@yang:/usr/local/slider$ slider status memcached1
 2014-12-12 12:22:58,305 [main] INFO  client.RMProxy -
 Connecting
 to
 ResourceManager at yang/127.0.0.1:8032
 2014-12-12 12:22:58,597 [main] INFO  client.SliderClient - {
 version : 1.0,
 name : memcached1,
 type : agent,
 state : 3,
 createTime : 1418357615354,
 updateTime : 1418357615603,
 originConfigurationPath :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/snapshot,
 generatedConfigurationPath :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/generated,
 dataPath :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/database,
 options : {
   slider.am.restart.supported : true,
   site.global.security_enabled : false,
   internal.application.home : null,
   internal.queue : default,
   application.name : memcached1,
   slider.cluster.directory.permissions : 0770,
   site.global.slider.allowed.ports : 48000, 49000,
 50001-50010,
   internal.tmp.dir :
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/tmp,
   java_home : /opt/soft/jdk,
   internal.snapshot.conf.path :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/snapshot,
   env.MALLOC_ARENA_MAX : 4,
   zookeeper.path :
 /services/slider/users/yang/memcached1,
   internal.container.failure.shortlife : 6,
   internal.application.image.path : null,
   internal.generated.conf.path :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/generated,
   site.fs.default.name : hdfs://yang:8020,
   site.global.additional_cp : /usr/lib/hadoop/lib/*,
   zookeeper.hosts : 127.0.0.1,
   internal.provider.name : agent,
   internal.data.dir.path :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/database,
   site.fs.defaultFS : hdfs://yang:8020,
   site.global.memory_val : 200M,
   slider.data.directory.permissions : 0770,
   site.global.listen_port :
 ${MEMCACHED.ALLOCATED_PORT}{PER_CONTAINER},
   zookeeper.quorum : 127.0.0.1:2181,
   site.global.xmx_val : 256m,
   internal.am.tmp.dir :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/tmp/appmaster,
   application.def :
 .slider/package/MEMCACHED/jmemcached-1.0.0.zip,
   internal.container.failure.threshold : 5,
   site.global.xms_val : 128m
 },
 info : {
   info.am.agent.status.url : https://yang:60422/;,
   yarn.memory : 2048,
   info.am.app.id : application_1418350976699_0004,
   info.am.agent.status.port : 60422,
   info.am.agent.ops.url : https://yang:47879/;,
   yarn.vcores : 32,
   info.am.container.id :
 container_1418350976699_0004_03_01,
   info.am.attempt.id :
 appattempt_1418350976699_0004_03,
   info.am.rpc.port : 48000,
   info.am.web.port : 49000,
   info.am.web.url : http://yang:49000/;,
   info.am.hostname : yang,
   info.am.agent.ops.port : 47879,
   status.application.build.info : Slider
 Core-0.60.0-incubating
 Built
 against commit# 9e03554f99 on Java 1.6.0_31 by yang,
   status.hadoop.build.info : 2.6.0,
   status.hadoop.deployed.info : branch-2.6.0
 @18e43357c8f927c0695f1e9522859d6a,
   live.time : 12 Dec 2014 04:13:35 GMT,
   live.time.millis : 1418357615354,
   create.time : 12 Dec 2014 04:13:35 GMT,
   create.time.millis : 1418357615354,
   containers.at.am-restart : 0,
   status.time : 12 Dec 2014 04:22:58 GMT,
   status.time.millis : 1418358178437
 },
 statistics : {
   MEMCACHED

Re: how to get get the port of memcached

2014-12-29 Thread Jon Maron

, 2014 at 4:52 PM, Yong Feng fengyong...@gmail.com
 wrote:
 
 Hi Team,
 
 Could anyone shed some light on it?
 
 Thanks,
 
 Yong
 
 On Wed, Dec 24, 2014 at 3:09 PM, Yong Feng fengyong...@gmail.com
 wrote:
 
 Happy Christmas, slider team.
 
 I use this mail thread for a similar question on querying exported
 port
 of
 slider sample cluster jmemcached. After I deployed jmemcached on
 slider,
 I
 did not find the entry point of the cluster by command slider
 status. I
 have to go to the host on which the jmemcached is running and execute
 ps
 command to get the allocated port.
 
 Generally speaking, how slider user knows the entry point of deployed
 cluster? OpenStack Heat and K8S of Google allows user to query entry
 point
 of their stack/service. As similar app orchestrator, I would like to
 know
 how slider resolve the issue of service discovery.
 
 Thanks,
 
 Yong
 
 
 
 On Tue, Dec 23, 2014 at 9:06 AM, 杨浩 yangha...@gmail.com wrote:
 
 I think it would be a convient way. The source idea is that to get
 some
 result of slider shell command by REST API. We just don't want to
 get
 the
 result by executing shell command in Java.
 
 2014-12-23 19:39 GMT+08:00 Jon Maron jma...@hortonworks.com:
 
 Are you suggesting that the client interact with the REST API to
 retrieve
 results (instead of the current rpc mechanism)?  That is part of
 the
 plan.
 
 On Dec 23, 2014, at 1:45 AM, 杨浩 yangha...@gmail.com wrote:
 
 I think a way to do so is that  exposing the REST API to get the
 result
 of
 slider shell command
 
 2014-12-23 14:22 GMT+08:00 Gour Saha gs...@hortonworks.com:
 
 Do you mean REST API?
 
 Significant work is going on in exposing REST API in slider
 for the
 next
 major release. We still don't know the best way to expose a
 REST
 API
 to
 retrieve the AM host:port (via YARN REST API maybe) as the REST
 endpoint
 itself will be served by the Slider AM host:port, but will
 surely
 come
 up
 with an elegant solution. Suggestions are welcome!!
 
 Check the uber jira for more details -
 https://issues.apache.org/jira/browse/SLIDER-151
 
 -Gour
 
 On Mon, Dec 22, 2014 at 1:50 AM, 杨浩 yangha...@gmail.com
 wrote:
 
 Hi ,I've get the am port through shell command slider list
 +applicationName+ --state RUNNING,but arguing with my
 boss, we
 think
 it's an ugly way to be used in production env.
 
 Can we get the am host:port through Java API
 
 2014-12-16 9:07 GMT+08:00 Gour Saha gs...@hortonworks.com:
 
 Once the app is up and running can you hit the following url
 and
 copy
 paste
 what you see?
 
 http://yang:8088/proxy/
 application_id/ws/v1/slider/publisher/slider
 
 where the application_id will be the value from the
 property *
 info.am.app.id
 http://info.am.app.id* in the status output above.
 
 -Gour
 
 On Thu, Dec 11, 2014 at 8:23 PM, 杨浩 yangha...@gmail.com
 wrote:
 
 yang@yang:/usr/local/slider$ slider status memcached1
 2014-12-12 12:22:58,305 [main] INFO  client.RMProxy -
 Connecting
 to
 ResourceManager at yang/127.0.0.1:8032
 2014-12-12 12:22:58,597 [main] INFO  client.SliderClient - {
 version : 1.0,
 name : memcached1,
 type : agent,
 state : 3,
 createTime : 1418357615354,
 updateTime : 1418357615603,
 originConfigurationPath :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/snapshot,
 generatedConfigurationPath :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/generated,
 dataPath :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/database,
 options : {
   slider.am.restart.supported : true,
   site.global.security_enabled : false,
   internal.application.home : null,
   internal.queue : default,
   application.name : memcached1,
   slider.cluster.directory.permissions : 0770,
   site.global.slider.allowed.ports : 48000, 49000,
 50001-50010,
   internal.tmp.dir :
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/tmp,
   java_home : /opt/soft/jdk,
   internal.snapshot.conf.path :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/snapshot,
   env.MALLOC_ARENA_MAX : 4,
   zookeeper.path :
 /services/slider/users/yang/memcached1,
   internal.container.failure.shortlife : 6,
   internal.application.image.path : null,
   internal.generated.conf.path :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/generated,
   site.fs.default.name : hdfs://yang:8020,
   site.global.additional_cp : /usr/lib/hadoop/lib/*,
   zookeeper.hosts : 127.0.0.1,
   internal.provider.name : agent,
   internal.data.dir.path :
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/database,
   site.fs.defaultFS : hdfs://yang:8020,
   site.global.memory_val : 200M,
   slider.data.directory.permissions : 0770,
   site.global.listen_port :
 ${MEMCACHED.ALLOCATED_PORT}{PER_CONTAINER},
   zookeeper.quorum : 127.0.0.1:2181,
   site.global.xmx_val : 256m,
   internal.am.tmp.dir :
 
 
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/tmp/appmaster,
   application.def :
 .slider/package/MEMCACHED/jmemcached-1.0.0.zip,
   internal.container.failure.threshold : 5

Re: when i run slider0.40+hadoop2.4 i found the error log in slider-error.log

2014-12-26 Thread Jon Maron


On Dec 26, 2014, at 4:53 AM, liujs3 liu...@asiainfo.com wrote:

 no,I thought slider should be running like this .
 
 in fact ,I config the slider according to sliderwebsite, system required need 
 openssl,when I install openssl,slider didn’t work,it said:
 
 ERROR 2014-11-30 22:59:05,697 NetUtil.py:52 - [Errno 1] _ssl.c:510: 
 error:100AE081:elliptic curve routines:EC_GROUP_new_by_curve_name:unknown 
 group
 ERROR 2014-11-30 22:59:05,697 NetUtil.py:54 - SSLError: Failed to connect. 
 Please check openssl library versions.
 
 So I update the openssl version :
 openssl-devel-1.0.1e-15.el6_6.4.x86_64.rpm  to  
 openssl-devel-1.0.1e-30.el6_6.4.x86_64.rpm , then slider work. 
 
 In my cluster ,I doesn’t config the secure . how can confirm it used secure 
 cluster?

It may not be a secure cluster.  OpenSSL is used by the AM to generate a 
certificate to support the SSL communication between the agents (running in 
launched containers) and the AM.  So even in a non-secure cluster openssl is 
required on NM hosts for the AM.

 
 -邮件原件-
 发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
 发送时间: 2014年12月26日 12:33
 收件人: dev@slider.incubator.apache.org
 主题: Re: when i run slider0.40+hadoop2.4 i found the error log in 
 slider-error.log
 
 bq. WARN security.SecurityUtils: Command openssl pkcs12
 
 Are you deploying in a secure cluster ?
 Can you try with 0.60.0 and hadoop 2.6.0 ?
 
 Cheers
 
 On Thu, Dec 25, 2014 at 7:17 PM, liujs3 liu...@asiainfo.com wrote:
 
 14/12/24 03:15:35 WARN util.NativeCodeLoader: Unable to load 
 native-hadoop library for your platform... using builtin-java classes 
 where applicable
 14/12/24 03:15:35 INFO appmaster.SliderAppMaster: Login user is ocean
 (auth:SIMPLE)
 14/12/24 03:15:35 INFO appmaster.SliderAppMaster: Slider Core-0.40 
 Built against commit# ${buildNumber} on Java 1.7.0_71 by ocean
 14/12/24 03:15:35 INFO appmaster.SliderAppMaster: Compiled against 
 Hadoop
 2.4.0
 14/12/24 03:15:35 INFO appmaster.SliderAppMaster: Hadoop runtime 
 version
 branch-2.4.0 with source checksum 375b2832a6641759c6eaf6e3e998147 and 
 build date 2014-03-31T08:29Z
 14/12/24 03:15:42 INFO appmaster.SliderAppMaster: Deploying cluster {,
 internal: {
  schema : http://example.org/specification/v2.0.0;,
  metadata : {
create.hadoop.deployed.info : branch-2.4.0 
 @375b2832a6641759c6eaf6e3e998147,
create.application.build.info : Slider Core-0.40 Built against 
 commit# ${buildNumber} on Java 1.7.0_71 by ocean,
create.hadoop.build.info : 2.4.0,
create.time.millis : 1419408918397,
create.time : 24 Dec 2014 08:15:18 GMT
  },
  global : {
internal.tmp.dir :
 hdfs://OCNoSQLBJ/user/ocean/.slider/cluster/oceanzk3/tmp/appmaster,
internal.generated.conf.path :
 hdfs://OCNoSQLBJ/user/ocean/.slider/cluster/oceanzk3/generated,
internal.snapshot.conf.path :
 hdfs://OCNoSQLBJ/user/ocean/.slider/cluster/oceanzk3/snapshot,
internal.container.failure.shortlife : 6,
slider.data.directory.permissions : 0770,
application.name : oceanzk3,
slider.cluster.directory.permissions : 0770,
internal.provider.name : agent,
internal.application.image.path :
 hdfs://OCNoSQLBJ/slider/agent/slider-agent.tar.gz,
internal.container.failure.threshold : 5,
internal.data.dir.path :
 hdfs://OCNoSQLBJ/user/ocean/.slider/cluster/oceanzk3/database
  },
  components : {
  }
 },
 resources: {
  schema : http://example.org/specification/v2.0.0;,
  metadata : {
  },
  global : {
  },
  components : {
HBASE_MASTER : {
  yarn.memory : 256,
  yarn.role.priority : 1,
  yarn.component.instances : 1
},
slider-appmaster : {
  yarn.memory : 256,
  yarn.vcores : 1,
  yarn.component.instances : 1
},
HBASE_REGIONSERVER : {
  yarn.memory : 256,
  yarn.role.priority : 2,
  yarn.component.instances : 1
}
  }
 },
 appConf :{
  schema : http://example.org/specification/v2.0.0;,
  metadata : {
  },
  global : {
site.global.security_enabled : false,
site.hbase-site.hbase.regionserver.global.memstore.upperLimit :
 0.4,
site.global.hbase_regionserver_heapsize : 1024m,
site.global.monitor_protocol : http,
site.hbase-site.hbase.hregion.max.filesize : 10737418240,
site.hbase-site.hbase.regionserver.port : 0,
zookeeper.path : /services/slider/users/ocean/oceanzk3,
site.hbase-site.hbase.zookeeper.quorum : ${ZK_HOST},
site.global.hbase_root_password : secret,
site.core-site.fs.defaultFS : hdfs://OCNoSQLBJ,
site.global.hadoop_user : ocean,
site.hbase-site.hbase.tmp.dir : ${AGENT_WORK_ROOT}/work/app/tmp,
site.hbase-site.hbase.hregion.memstore.mslab.enabled : true,
site.hbase-site.zookeeper.znode.parent : ${DEF_ZK_PATH},
site.global.app_install_dir : ${AGENT_WORK_ROOT}/app/install,
site.hbase-site.zookeeper.session.timeout : 3,
site.fs.default.name : hdfs://OCNoSQLBJ,
site.global.app_user : ocean,
site.hdfs-site.dfs.ha.automatic-failover.enabled : true,

Re: how to get get the port of memcached

2014-12-23 Thread Jon Maron

Are you suggesting that the client interact with the REST API to retrieve 
results (instead of the current rpc mechanism)?  That is part of the plan. 

 On Dec 23, 2014, at 1:45 AM, 杨浩 yangha...@gmail.com wrote:
 
 I think a way to do so is that  exposing the REST API to get the result of
 slider shell command
 
 2014-12-23 14:22 GMT+08:00 Gour Saha gs...@hortonworks.com:
 
 Do you mean REST API?
 
 Significant work is going on in exposing REST API in slider for the next
 major release. We still don't know the best way to expose a REST API to
 retrieve the AM host:port (via YARN REST API maybe) as the REST endpoint
 itself will be served by the Slider AM host:port, but will surely come up
 with an elegant solution. Suggestions are welcome!!
 
 Check the uber jira for more details -
 https://issues.apache.org/jira/browse/SLIDER-151
 
 -Gour
 
 On Mon, Dec 22, 2014 at 1:50 AM, 杨浩 yangha...@gmail.com wrote:
 
 Hi ,I've get the am port through shell command slider list
 +applicationName+ --state RUNNING,but arguing with my boss, we think
 it's an ugly way to be used in production env.
 
 Can we get the am host:port through Java API
 
 2014-12-16 9:07 GMT+08:00 Gour Saha gs...@hortonworks.com:
 
 Once the app is up and running can you hit the following url and copy
 paste
 what you see?
 
 http://yang:8088/proxy/application_id/ws/v1/slider/publisher/slider
 
 where the application_id will be the value from the property *
 info.am.app.id
 http://info.am.app.id* in the status output above.
 
 -Gour
 
 On Thu, Dec 11, 2014 at 8:23 PM, 杨浩 yangha...@gmail.com wrote:
 
 yang@yang:/usr/local/slider$ slider status memcached1
 2014-12-12 12:22:58,305 [main] INFO  client.RMProxy - Connecting to
 ResourceManager at yang/127.0.0.1:8032
 2014-12-12 12:22:58,597 [main] INFO  client.SliderClient - {
  version : 1.0,
  name : memcached1,
  type : agent,
  state : 3,
  createTime : 1418357615354,
  updateTime : 1418357615603,
  originConfigurationPath :
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/snapshot,
  generatedConfigurationPath :
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/generated,
  dataPath :
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/database,
  options : {
slider.am.restart.supported : true,
site.global.security_enabled : false,
internal.application.home : null,
internal.queue : default,
application.name : memcached1,
slider.cluster.directory.permissions : 0770,
site.global.slider.allowed.ports : 48000, 49000, 50001-50010,
internal.tmp.dir :
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/tmp,
java_home : /opt/soft/jdk,
internal.snapshot.conf.path :
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/snapshot,
env.MALLOC_ARENA_MAX : 4,
zookeeper.path : /services/slider/users/yang/memcached1,
internal.container.failure.shortlife : 6,
internal.application.image.path : null,
internal.generated.conf.path :
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/generated,
site.fs.default.name : hdfs://yang:8020,
site.global.additional_cp : /usr/lib/hadoop/lib/*,
zookeeper.hosts : 127.0.0.1,
internal.provider.name : agent,
internal.data.dir.path :
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/database,
site.fs.defaultFS : hdfs://yang:8020,
site.global.memory_val : 200M,
slider.data.directory.permissions : 0770,
site.global.listen_port :
 ${MEMCACHED.ALLOCATED_PORT}{PER_CONTAINER},
zookeeper.quorum : 127.0.0.1:2181,
site.global.xmx_val : 256m,
internal.am.tmp.dir :
 hdfs://yang:8020/user/yang/.slider/cluster/memcached1/tmp/appmaster,
application.def :
 .slider/package/MEMCACHED/jmemcached-1.0.0.zip,
internal.container.failure.threshold : 5,
site.global.xms_val : 128m
  },
  info : {
info.am.agent.status.url : https://yang:60422/;,
yarn.memory : 2048,
info.am.app.id : application_1418350976699_0004,
info.am.agent.status.port : 60422,
info.am.agent.ops.url : https://yang:47879/;,
yarn.vcores : 32,
info.am.container.id :
 container_1418350976699_0004_03_01,
info.am.attempt.id : appattempt_1418350976699_0004_03,
info.am.rpc.port : 48000,
info.am.web.port : 49000,
info.am.web.url : http://yang:49000/;,
info.am.hostname : yang,
info.am.agent.ops.port : 47879,
status.application.build.info : Slider Core-0.60.0-incubating
 Built
 against commit# 9e03554f99 on Java 1.6.0_31 by yang,
status.hadoop.build.info : 2.6.0,
status.hadoop.deployed.info : branch-2.6.0
 @18e43357c8f927c0695f1e9522859d6a,
live.time : 12 Dec 2014 04:13:35 GMT,
live.time.millis : 1418357615354,
create.time : 12 Dec 2014 04:13:35 GMT,
create.time.millis : 1418357615354,
containers.at.am-restart : 0,
status.time : 12 Dec 2014 04:22:58 GMT,
status.time.millis : 1418358178437
  },
  statistics : {
MEMCACHED : {
  containers.start.started : 1,
  containers.live : 1,

Re: review board

2014-12-21 Thread Jon Maron

I’ve wondered about that myself.  I’m wondering if anything is really required 
administratively, or can we simply install RBTools and start posting reviews?

On Dec 21, 2014, at 4:28 PM, Ted Yu yuzhih...@gmail.com wrote:

 Hi,
 Who should I contact so that patch for Slider can be added to
 https://reviews.apache.org/r/new/ ?
 
 Cheers


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Application Management and Authorization

2014-12-18 Thread Jon Maron

Hi,

 Is there an existing approach for exposing management functions to users other 
than the application creator?  Most management operations validate the 
existence of a cluster definition in the cluster directory, which is by default 
under the creator’s home directory (e.g. 
/users/username/.slider/cluster/clusterName).  That check will fail if a user 
other than the creator attempts a management operation (e.g. stop) even if an 
authorization policy is in place to allow the operation (assuming RPC ACLs or 
Ranger is leveraged).  Do we:

- Expect users to leverage the “slider.base.path” property to designate 
a location accessible to all management users (with appropriate permissions set 
for that directory)?
- Move the default base path to one accessible to more users (the users 
designated as authorized to manage application instances), e.g. /apps/slider?   

— Jon



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Application Management and Authorization

2014-12-18 Thread Jon Maron


On Dec 18, 2014, at 1:53 PM, Steve Loughran ste...@hortonworks.com wrote:

 some of these operations could skip the is the cluster in the path task;
 it's probably just there for fail-fast.
 
 the AM could be located via YARN rm lookup (today), or via the full YARN
 registry, then (IPC/REST) operations made to it
 
 straightforward: stop, status checks.
 the flex command currently writes the spec to HDFS if the cluster is down;
 it only talks to the AM when the AM is live.

OK - I had similar thoughts. I suppose a JIRA is required.

 
 operations like create  destroy are very FS centric
 
 On 18 December 2014 at 14:54, Jon Maron jma...@hortonworks.com wrote:
 
 Hi,
 
 Is there an existing approach for exposing management functions to users
 other than the application creator?  Most management operations validate
 the existence of a cluster definition in the cluster directory, which is by
 default under the creator’s home directory (e.g.
 /users/username/.slider/cluster/clusterName).  That check will fail if a
 user other than the creator attempts a management operation (e.g. stop)
 even if an authorization policy is in place to allow the operation
 (assuming RPC ACLs or Ranger is leveraged).  Do we:
 
- Expect users to leverage the “slider.base.path” property to
 designate a location accessible to all management users (with appropriate
 permissions set for that directory)?
- Move the default base path to one accessible to more users (the
 users designated as authorized to manage application instances), e.g.
 /apps/slider?
 
 — Jon
 
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [VOTE] Apache Slider Incubating Release 0.60.0-incubating RC1

2014-11-17 Thread Jon Maron

+1

- performed a build
- got a clean run of the full functional test suite against an up to date 
Ambari generated cluster:

Tests run: 54, Failures: 0, Errors: 0, Skipped: 2

-  Full unit test run came up clean:  

Tests run: 420, Failures: 0, Errors: 0, Skipped: 11

— Jon

On Nov 14, 2014, at 2:26 PM, Steve Loughran ste...@hortonworks.com wrote:

 Hi
 
 The updated RC1 release of slider is up for review and vote. Please
 download and review it
 
 
 all changes in this release are listed at:
 https://issues.apache.org/jira/browse/SLIDER/fixforversion/12327198/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
 
 Main changes since the RC0
 
   - fixed license problem
   - SLIDER-647: allocation requests not being satisfied when a cluster
   goes to labels
   (the default placement policy is now lax, you can request strict on
   an application instance or component-by-component basis
   -
   - SLIDER-650 regression: local zk nodes not being deleted on instance
   destroy (well spotted Sumit!)
 
 Note that this is a source only release and we are voting on the source.
 
 artifacts at
 http://people.apache.org/~stevel/slider/slider-release-0.60.0-incubating-rc1
 
 source at
 https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;a=shortlog;h=refs/tags/release-0.60.0-incubating-rc0
 
 PGP keys at
 http://pgp.mit.edu:11371/pks/lookup?op=vindexsearch=ste...@apache.org
 
  Build instructions at:
 http://slider.incubator.apache.org/developing/building.html
 
Vote will be open for 72 hours
 
[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)
 
 -Steve
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Do I have to repackage if I change appConfig.json/resources.json

2014-11-14 Thread Jon Maron

Perhaps I’m misunderstanding your question, but in general, if making 
modifications to those files, you can simply create a new application instance 
referencing the new versions of the file from the command line:

./slider create app name —template appConfig path —resources resources 
file path

the app config references the application package in HDFS, which can be 
pre-seeded using “slider install-package”

— Jon

On Nov 14, 2014, at 2:54 PM, hsy...@gmail.com wrote:

 Everytime I change appConfig.json and resources.json. Do I have to
 repackage the zip file and redeploy the file to hdfs?
 
 Thanks!


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop 2.6.0 being voted on on hadoop common-dev list, what do we need in Hadoop 2.7?

2014-11-12 Thread Jon Maron

Security wise - perhaps the ability to actually wire encrypt AM traffic (i.e AM 
HTTPS support).  Will require management of AM specific keystones, RM/AM trust 
establishment, etc (YARN-2554)

— Jon

On Nov 12, 2014, at 9:37 AM, Sumit Mohanty sumit.moha...@gmail.com wrote:

 Java7 is good news.
 
 Lets go through other open issues and feature that we have in SLIDER jiras
 and see if any of then are better addressed in Hadoop. Anti-affinity (the
 YARN jira exists), application instance specific vmem, pmem checks such as
 if an app is using label then it has the hosts to itself anyway.
 
 On Wed, Nov 12, 2014 at 6:05 AM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 Hadoop 2.6.0 is being voted on in the common-dev list...everyone is
 encouraged to download it and play with it, make a vote. Everyone's
 opinions matter.
 
 Now, what do we want in Hadoop 2.7?
 
 off the top of my head
 
   - finish the registry stuff, including REST API
   - full support of REST APIs in YARN apps with AM Filter and RM Proxy
   supporting other verbs
   - some other things under YARN-896
 
 if we can think of some more I can put them in a slider update to hadoop
 2.7 JIRA.
 
 BTW, Hadoop 2.7 is Java7+ only
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 
 
 
 
 -- 
 thanks
 Sumit


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Documenting key configuration properties (appConfig, mostly)

2014-11-11 Thread Jon Maron

Hi,

  I’ve started adding some documentation about key configuration properties to 
the incubator site’s core.md file (the core configuration spec).  Perhaps these 
need to be pulled out as a separate document, but for the time being I thought 
I’d let it be known in case you are looking for a mechanism to document new 
configuration options.

— Jon


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Memcahed doesn't start with latest slider code

2014-11-05 Thread Jon Maron

It may help to provide any of the agent logs or memcached logs from the node 
managers.  This could occur for any number of reasons including wrong java_home 
value.

— Jon

On Nov 5, 2014, at 1:55 PM, Pushkar Raste pushkar.ra...@gmail.com wrote:

 May I should provide entire log
 
 2014-11-05 18:27:24,345 [main] INFO  Configuration.deprecation -
 slider.registry.path is deprecated. Instead, use hadoop.registry.zk.root
 2014-11-05 18:27:24,350 [main] INFO  appmaster.SliderAppMaster - AM
 configuration:
 fs.defaultFS=hdfs://localhost:9000
 hadoop.registry.zk.quorum=localhost:2181
 hadoop.registry.zk.root=/registry
 slider.registry.path=/registry
 slider.yarn.queue=default
 yarn.application.classpath=/usr/local/hadoop/etc/hadoop,/usr/local/hadoop/etc/hadoop/*,/usr/local/hadoop/share/hadoop/common/*,/usr/local/hadoop/share/hadoop/common/lib/*,/usr/local/hadoop/share/hadoop/hdfs/*,/usr/local/hadoop/share/hadoop/hdfs/lib/*,/usr/local/hadoop/share/hadoop/yarn/*,/usr/local/hadoop/share/hadoop/yarn/lib/*,/usr/local/hadoop/share/hadoop/mapreduce/*,/usr/local/hadoop/share/hadoop/mapreduce/lib/*
 yarn.log-aggregation-enable=true
 yarn.resourcemanager.address=localhost:8032
 yarn.resourcemanager.scheduler.address=localhost:8030
 
 2014-11-05 18:27:24,507 [main] INFO  appmaster.SliderAppMaster - Cluster is
 insecure
 2014-11-05 18:27:25,000 [main] INFO  appmaster.SliderAppMaster - Login user
 is root (auth:SIMPLE)
 2014-11-05 18:27:25,013 [openssl-001] INFO  appmaster.SliderAppMaster -
 OpenSSL 1.0.1 14 Mar 2012
 2014-11-05 18:27:25,213 [openssl-001] WARN  appmaster.SliderAppMaster -
 2014-11-05 18:27:25,214 [openssl-001] INFO  appmaster.SliderAppMaster -
 2014-11-05 18:27:25,231 [python-003] WARN  appmaster.SliderAppMaster -
 Python 2.7.3
 2014-11-05 18:27:25,431 [python-003] WARN  appmaster.SliderAppMaster -
 2014-11-05 18:27:25,431 [python-003] INFO  appmaster.SliderAppMaster -
 2014-11-05 18:27:25,434 [main] INFO  appmaster.SliderAppMaster - Slider
 Core-0.51.0-incubating-SNAPSHOT Built against commit# bbde42bdf9 on Java
 1.7.0_67 by praste
 2014-11-05 18:27:25,434 [main] INFO  appmaster.SliderAppMaster - Compiled
 against Hadoop 2.6.0-SNAPSHOT
 2014-11-05 18:27:25,436 [main] INFO  appmaster.SliderAppMaster - Hadoop
 runtime version (detached from b4446cb) with source checksum
 d2d3ea14a0fdbf31a0273fc4f2ad594b and build date 2014-10-29T18:31Z
 2014-11-05 18:27:25,437 [main] INFO  appmaster.SliderAppMaster -
 Application defined at hdfs://localhost:9000/user/root/.slider/cluster/cl2
 2014-11-05 18:27:27,195 [main] INFO  appmaster.SliderAppMaster - Deploying
 cluster {,
 internal: {
  schema : http://example.org/specification/v2.0.0;,
  metadata : {
create.hadoop.deployed.info : (detached from b4446cb)
 @d2d3ea14a0fdbf31a0273fc4f2ad594b,
create.application.build.info : Slider
 Core-0.51.0-incubating-SNAPSHOT Built against commit# bbde42bdf9 on Java
 1.7.0_67 by praste,
create.hadoop.build.info : 2.6.0-SNAPSHOT,
create.time.millis : 1415212024534,
create.time : 5 Nov 2014 18:27:04 GMT
  },
  global : {
internal.tmp.dir :
 hdfs://localhost:9000/user/root/.slider/cluster/cl2/tmp,
internal.generated.conf.path :
 hdfs://localhost:9000/user/root/.slider/cluster/cl2/generated,
internal.snapshot.conf.path :
 hdfs://localhost:9000/user/root/.slider/cluster/cl2/snapshot,
internal.container.failure.shortlife : 6,
slider.data.directory.permissions : 0770,
application.name : cl2,
slider.cluster.directory.permissions : 0770,
internal.provider.name : agent,
internal.am.tmp.dir :
 hdfs://localhost:9000/user/root/.slider/cluster/cl2/tmp/appmaster,
internal.container.failure.threshold : 5,
internal.data.dir.path :
 hdfs://localhost:9000/user/root/.slider/cluster/cl2/database
  },
  credentials : { },
  components : { }
 },
 resources: {
  schema : http://example.org/specification/v2.0.0;,
  metadata : { },
  global : { },
  credentials : { },
  components : {
slider-appmaster : {
  yarn.memory : 256,
  yarn.vcores : 1,
  yarn.component.instances : 1
},
MEMCACHED : {
  yarn.memory : 256,
  yarn.role.priority : 1,
  yarn.component.instances : 1
}
  }
 },
 appConf :{
  schema : http://example.org/specification/v2.0.0;,
  metadata : { },
  global : {
site.fs.default.name : hdfs://localhost:9000,
site.global.app_user : yarn,
site.global.additional_cp : /usr/lib/hadoop/lib/*,
zookeeper.hosts : localhost,
site.global.pid_file : ${AGENT_WORK_ROOT}/app/run/component.pid,
java_home : /usr/lib/jvm/java-7-openjdk-amd64,
site.fs.defaultFS : hdfs://localhost:9000,
env.MALLOC_ARENA_MAX : 4,
zookeeper.path : /services/slider/users/root/cl2,
site.global.memory_val : 200M,
site.global.listen_port :
 ${MEMCACHED.ALLOCATED_PORT}{DO_NOT_PROPAGATE},
zookeeper.quorum : localhost:2181,
site.global.xmx_val : 256m,
site.global.app_root :
 ${AGENT_WORK_ROOT}/app/install/jmemcached-1.0.0,

Re: yes, I've broken jenkins and possibly everyone's builds

2014-11-02 Thread Jon Maron

Well..that explains it… ;)

— Jon

On Nov 2, 2014, at 9:40 AM, Steve Loughran ste...@hortonworks.com wrote:

 I know, I've just broken everyone's builds by merging SLIDER-531 into trunk
 ... I needed to get this in to go with some other things that aren't quite
 checked into apache hadoop right now.
 
 I'll push out a compatbile hadoop branch to github later today if I can't
 get the patches into hadoop  itself in time
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Wrong FS: hdfs://localhost:9000/user/root/.slider/cluster/c100, expected: file:/// Issues deploying memcached using slider.

2014-10-27 Thread Jon Maron

I think the problem is probably in your appConfig file (e.g. referencing an 
application package from the the local filesystem as opposed to HDFS).  Can you 
attach that?

— Jon

On Oct 27, 2014, at 10:54 AM, Pushkar Raste pushkar.ra...@gmail.com wrote:

 Hi,
 I am trying to to deploy memcached package using slider (0.40.0 version). I
 keep getting Wrong FS:
 hdfs://localhost:9000/user/root/.slider/cluster/c100, expected: file:///
 error
 
 I saw that  'Rui Zhang' had similar issue, but she got around by
 setting SLIDER_CLASSPATH_EXTRA.
 Unfortunately that solution did not work for me. Here is my setup
 1. slider-client.xml
 ?xml version=1.0? ?xml-stylesheet type=text/xsl
 href=configuration.xsl? !-- Licensed to the Apache Software Foundation
 (ASF) under one or more contributor license agreements. See the NOTICE file
 distributed with this work for additional information regarding copyright
 ownership. The ASF licenses this file to You under the Apache License:
 Version 2.0 (the License); you may not use this file except in compliance
 with the License. You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable
 law or agreed to in writing: software distributed under the License is
 distributed on an AS IS BASIS: WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND: either express or implied. See the License for the specific language
 governing permissions and limitations under the License. -- !--
 Properties set here are picked up in the client. -- configuration
 property nameslider.client.resource.origin/name
 valueconf/slider-client.xml/value descriptionThis is just for
 diagnostics/description /property property
 nameyarn.log-aggregation-enable/name valuetrue/value /property
 property nameslider.yarn.queue/name valuedefault/value
 descriptionYARN queue for the Application Master/description
 /property !-- property nameyarn.resourcemanager.address/name
 valuemaster:8032/value /property property namefs.defaultFS/name
 valuehdfs://master:9090/value /property property
 nameyarn.resourcemanager.principal/name
 valueyarn/master@MINICLUSTER/value
 /property property nameslider.security.enabled/name
 valuetrue/value /property property
 namedfs.namenode.kerberos.principal/name
 valuehdfs/master@MINICLUSTER/value
 /property -- property nameyarn.application.classpath/name
 value/usr/local/hadoop/etc/hadoop/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*/value
 /property property nameslider.zookeeper.quorum/name
 valuelocalhost:2181/value /property property
 nameyarn.resourcemanager.address/name valuelocalhost:8032/value
 /property property nameyarn.resourcemanager.scheduler.address/name
 valuelocalhost:8030/value /property property
 namefs.defaultFS/name valuehdfs://localhost:9000/value /property
 /configuration
 
 2. Environment variables in my ~/.bashrc
 export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 export
 HADOOP_INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin
 export PATH=$PATH:$HADOOP_INSTALL/sbin export
 HADOOP_MAPRED_HOME=$HADOOP_INSTALL export
 HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL
 export YARN_HOME=$HADOOP_INSTALL export
 HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native export
 HADOOP_OPTS=-Djava.library.path=$HADOOP_INSTALL/lib export
 HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop export
 SLIDER_CLASSPATH_EXTRA=$HADOOP_CONF_DIR/*:/usr/local/hadoop/etc/hadoop/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*
 export YARN_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
 
 
 
 3. Output when I run 'python slider.py version' slider_home =
 /usr/local/slider/slider-0.40 slider_jvm_opts =
 -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Xmx256m
 -Djava.confdir=/usr/local/slider/slider-0.40/conf slider_classpath =
 /usr/local/slider/slider-0.40/lib/*:/usr/local/slider/slider-0.40/conf:/usr/local/hadoop/etc/hadoop/*:/usr/local/hadoop/etc/hadoop/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*
 ready to exec : ['java', '-Djava.net.preferIPv4Stack=true',
 '-Djava.awt.headless=true', '-Xmx256m',
 '-Djava.confdir=/usr/local/slider/slider-0.40/conf', '-classpath',

Re: Slider-develop - Build # 369 - Failure

2014-10-25 Thread Jon Maron

Looks like a failure based on my commit. Will fix soon. 


 On Oct 25, 2014, at 6:39 PM, Apache Jenkins Server 
 jenk...@builds.apache.org wrote:
 
 The Apache Jenkins build system has built Slider-develop (build #369)
 
 Status: Failure
 
 Check console output at https://builds.apache.org/job/Slider-develop/369/ to 
 view the results.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Intertmittent unit test failures

2014-10-21 Thread Jon Maron

Additional logging from failing unit test:

2014-10-21 10:46:37,976 [Thread-76] INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:serviceStart(465)) - ContainerManager started at 
/192.168.64.1:58613
2014-10-21 10:46:37,976 [Thread-76] INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:serviceStart(466)) - ContainerManager bound to 
HW10386.local/192.168.64.1:0

On Oct 21, 2014, at 11:06 AM, Jonathan Maron jonma...@gmail.com wrote:

 I am getting the following stack trace in random unit tests (when running the 
 full suite):
 
 java.net.BindException: Problem binding to [HW10386.local/192.168.64.1:0] 
 java.net.BindException: Can't assign requested address; For more details see: 
  http://wiki.apache.org/hadoop/BindException
   at sun.nio.ch.Net.connect0(Native Method)
   at sun.nio.ch.Net.connect(Net.java:465)
   at sun.nio.ch.Net.connect(Net.java:457)
   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
   at org.apache.hadoop.ipc.Client.call(Client.java:1438)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy26.getApplications(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:237)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy27.getApplications(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:430)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:414)
   at 
 org.apache.slider.client.SliderYarnClientImpl.listInstances(SliderYarnClientImpl.java:75)
   at 
 org.apache.slider.client.SliderYarnClientImpl.findAllLiveInstances(SliderYarnClientImpl.java:241)
   at 
 org.apache.slider.core.registry.YarnAppListClient.findAllLiveInstances(YarnAppListClient.java:65)
   at 
 org.apache.slider.client.SliderClient.findAllLiveInstances(SliderClient.java:1856)
   at 
 org.apache.slider.client.SliderClient.verifyNoLiveClusters(SliderClient.java:1524)
   at 
 org.apache.slider.client.SliderClient.buildInstanceDefinition(SliderClient.java:769)
   at 
 org.apache.slider.client.SliderClient.actionBuild(SliderClient.java:661)
   at 
 org.apache.slider.client.SliderClient.actionCreate(SliderClient.java:607)
   at org.apache.slider.client.SliderClient.exec(SliderClient.java:368)
   at 
 org.apache.slider.client.SliderClient.runService(SliderClient.java:339)
   at 
 org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
   at 
 org.apache.slider.test.SliderTestUtils.execSliderCommand(SliderTestUtils.groovy:570)
   at 
 org.apache.slider.test.SliderTestUtils.launchClientAgainstRM(SliderTestUtils.groovy:621)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
   at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
   at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1339)
   at 
 org.codehaus.groovy.runtime.callsite.StaticMetaClassSite.callStatic(StaticMetaClassSite.java:62)
   at 
 org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallStatic(CallSiteArray.java:53)
   at 
 org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:157)
   at 
 org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:173)

Re: Intertmittent unit test failures

2014-10-21 Thread Jon Maron


On Oct 21, 2014, at 11:21 AM, Sumit Mohanty sumit.moha...@gmail.com wrote:

 I have seen similar issues where, at home, localhost was being mapped to
 the router's IP and port allocation was failing. I think I just modified
 /etc/hosts to explicitly map localhost to 127.0.0.1.
 
 In your case, which device has this ip 192.168.64.1 http://192.168.64.1:0?

Couldn’t actually find a mapped device.  I checked my /etc/hosts and I have 
HW10386.home mapped to local box, but the name now appears to be HW10386.local. 
 Adde the mapping for the new name and re-running...

 
 On Tue, Oct 21, 2014 at 8:09 AM, Jon Maron jma...@hortonworks.com wrote:
 
 Additional logging from failing unit test:
 
 2014-10-21 10:46:37,976 [Thread-76] INFO
 containermanager.ContainerManagerImpl
 (ContainerManagerImpl.java:serviceStart(465)) - ContainerManager started at
 /192.168.64.1:58613
 2014-10-21 10:46:37,976 [Thread-76] INFO
 containermanager.ContainerManagerImpl
 (ContainerManagerImpl.java:serviceStart(466)) - ContainerManager bound to
 HW10386.local/192.168.64.1:0
 
 On Oct 21, 2014, at 11:06 AM, Jonathan Maron jonma...@gmail.com wrote:
 
 I am getting the following stack trace in random unit tests (when
 running the full suite):
 
 java.net.BindException: Problem binding to [HW10386.local/192.168.64.1:0]
 java.net.BindException: Can't assign requested address; For more details
 see:  http://wiki.apache.org/hadoop/BindException
  at sun.nio.ch.Net.connect0(Native Method)
  at sun.nio.ch.Net.connect(Net.java:465)
  at sun.nio.ch.Net.connect(Net.java:457)
  at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
  at
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
  at
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
  at
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
  at
 org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
  at org.apache.hadoop.ipc.Client.call(Client.java:1438)
  at org.apache.hadoop.ipc.Client.call(Client.java:1399)
  at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
  at com.sun.proxy.$Proxy26.getApplications(Unknown Source)
  at
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:237)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
  at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy27.getApplications(Unknown Source)
  at
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:430)
  at
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:414)
  at
 org.apache.slider.client.SliderYarnClientImpl.listInstances(SliderYarnClientImpl.java:75)
  at
 org.apache.slider.client.SliderYarnClientImpl.findAllLiveInstances(SliderYarnClientImpl.java:241)
  at
 org.apache.slider.core.registry.YarnAppListClient.findAllLiveInstances(YarnAppListClient.java:65)
  at
 org.apache.slider.client.SliderClient.findAllLiveInstances(SliderClient.java:1856)
  at
 org.apache.slider.client.SliderClient.verifyNoLiveClusters(SliderClient.java:1524)
  at
 org.apache.slider.client.SliderClient.buildInstanceDefinition(SliderClient.java:769)
  at
 org.apache.slider.client.SliderClient.actionBuild(SliderClient.java:661)
  at
 org.apache.slider.client.SliderClient.actionCreate(SliderClient.java:607)
  at
 org.apache.slider.client.SliderClient.exec(SliderClient.java:368)
  at
 org.apache.slider.client.SliderClient.runService(SliderClient.java:339)
  at
 org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
  at
 org.apache.slider.test.SliderTestUtils.execSliderCommand(SliderTestUtils.groovy:570)
  at
 org.apache.slider.test.SliderTestUtils.launchClientAgainstRM(SliderTestUtils.groovy:621)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
 org.codehaus.groovy.reflection.CachedMethod.invoke

Re: Intertmittent unit test failures

2014-10-21 Thread Jon Maron

Thanks!  That was the issue.

— Jon

On Oct 21, 2014, at 11:29 AM, Jon Maron jma...@hortonworks.com wrote:

 
 On Oct 21, 2014, at 11:21 AM, Sumit Mohanty sumit.moha...@gmail.com wrote:
 
 I have seen similar issues where, at home, localhost was being mapped to
 the router's IP and port allocation was failing. I think I just modified
 /etc/hosts to explicitly map localhost to 127.0.0.1.
 
 In your case, which device has this ip 192.168.64.1 http://192.168.64.1:0?
 
 Couldn’t actually find a mapped device.  I checked my /etc/hosts and I have 
 HW10386.home mapped to local box, but the name now appears to be 
 HW10386.local.  Adde the mapping for the new name and re-running...
 
 
 On Tue, Oct 21, 2014 at 8:09 AM, Jon Maron jma...@hortonworks.com wrote:
 
 Additional logging from failing unit test:
 
 2014-10-21 10:46:37,976 [Thread-76] INFO
 containermanager.ContainerManagerImpl
 (ContainerManagerImpl.java:serviceStart(465)) - ContainerManager started at
 /192.168.64.1:58613
 2014-10-21 10:46:37,976 [Thread-76] INFO
 containermanager.ContainerManagerImpl
 (ContainerManagerImpl.java:serviceStart(466)) - ContainerManager bound to
 HW10386.local/192.168.64.1:0
 
 On Oct 21, 2014, at 11:06 AM, Jonathan Maron jonma...@gmail.com wrote:
 
 I am getting the following stack trace in random unit tests (when
 running the full suite):
 
 java.net.BindException: Problem binding to [HW10386.local/192.168.64.1:0]
 java.net.BindException: Can't assign requested address; For more details
 see:  http://wiki.apache.org/hadoop/BindException
  at sun.nio.ch.Net.connect0(Native Method)
  at sun.nio.ch.Net.connect(Net.java:465)
  at sun.nio.ch.Net.connect(Net.java:457)
  at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
  at
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
  at
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
  at
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
  at
 org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
  at org.apache.hadoop.ipc.Client.call(Client.java:1438)
  at org.apache.hadoop.ipc.Client.call(Client.java:1399)
  at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
  at com.sun.proxy.$Proxy26.getApplications(Unknown Source)
  at
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:237)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
  at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy27.getApplications(Unknown Source)
  at
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:430)
  at
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:414)
  at
 org.apache.slider.client.SliderYarnClientImpl.listInstances(SliderYarnClientImpl.java:75)
  at
 org.apache.slider.client.SliderYarnClientImpl.findAllLiveInstances(SliderYarnClientImpl.java:241)
  at
 org.apache.slider.core.registry.YarnAppListClient.findAllLiveInstances(YarnAppListClient.java:65)
  at
 org.apache.slider.client.SliderClient.findAllLiveInstances(SliderClient.java:1856)
  at
 org.apache.slider.client.SliderClient.verifyNoLiveClusters(SliderClient.java:1524)
  at
 org.apache.slider.client.SliderClient.buildInstanceDefinition(SliderClient.java:769)
  at
 org.apache.slider.client.SliderClient.actionBuild(SliderClient.java:661)
  at
 org.apache.slider.client.SliderClient.actionCreate(SliderClient.java:607)
  at
 org.apache.slider.client.SliderClient.exec(SliderClient.java:368)
  at
 org.apache.slider.client.SliderClient.runService(SliderClient.java:339)
  at
 org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
  at
 org.apache.slider.test.SliderTestUtils.execSliderCommand(SliderTestUtils.groovy:570)
  at
 org.apache.slider.test.SliderTestUtils.launchClientAgainstRM(SliderTestUtils.groovy:621)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43

Re: new committer

2014-09-27 Thread Jon Maron

Welcome!

Going Mobile


 On Sep 27, 2014, at 9:13 AM, Billie Rinaldi billie.rina...@gmail.com wrote:
 
 Welcome Gour Saha, a new committer and PPMC member for Apache Slider!
 
 Billie

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: --queue as an input parameter

2014-09-20 Thread Jon Maron


On Sep 20, 2014, at 5:26 PM, Sumit Mohanty sumit.moha...@gmail.com wrote:

 Actually, update may not be a bad idea.
 
 Do we allow any updates today to configs internal/appConfig/resources?
 

I know we allow for appConfig updates - I’ve done so to update the app master 
jvm.opts value.

 On Sat, Sep 20, 2014 at 2:18 PM, Jon Maron jma...@hortonworks.com wrote:
 
 
 On Sep 20, 2014, at 5:11 PM, Sumit Mohanty smoha...@apache.org wrote:
 
 Currently, the queue to submit an application is defined in
 slider-client.xml. It is possible that the same deployment of
 slider-client
 can be used to submit to different queues.
 
 We can add --queue as a parameter where the value will override the
 default
 in slider-client.xml.
 
 Going further, the queue value can also be persisted in internal.json
 file in HDFS for future starts after stops.
 
 If we go the above route, should we also allow over-riding the stored
 value
 of queue when issuing a start. I am thinking of scenarios where queue
 configuration might have changed and application need to be restarted
 with
 a different queue.
 
 May not be a bad idea, but it does seem similar to the “update” scenarios
 we support.  Should it just be something else we can “update”?
 
 
 -Sumit
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 
 
 
 
 -- 
 thanks
 Sumit


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: slider create cl1 YARN AM container exception

2014-09-11 Thread Jon Maron

Which version of python is running on your hosts?  The code is simply trying to 
execute ‘python —version’, and it appears that the python version you’re using 
indicates that is doesn’t support that option.  Can you try the same call form 
a shell on the AM host?

— Jon

On Sep 11, 2014, at 9:54 AM, 牛兆捷 nzjem...@gmail.com wrote:

 *The YARN only accept one application and the AppMaster container fails
 very soon once it starts.*
 
 *The error log of AM container:*
 
 14/09/11 21:07:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 14/09/11 21:07:25 INFO appmaster.SliderAppMaster: Login user is hustnn
 (auth:SIMPLE)
 14/09/11 21:07:25 WARN appmaster.SliderAppMaster:
 14/09/11 21:07:25 INFO appmaster.SliderAppMaster:
 14/09/11 21:07:25 INFO appmaster.SliderAppMaster: OpenSSL 0.9.8e-fips-rhel5
 01 Jul 2008
 14/09/11 21:07:25 WARN appmaster.SliderAppMaster:
 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: Unknown option: --
 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: usage: python [option]
 ... [-c cmd | -m mod | file | -] [arg] ...
 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: Try `python -h' for more
 information.
 14/09/11 21:07:25 INFO appmaster.SliderAppMaster:
 14/09/11 21:07:25 INFO service.AbstractService: Service python failed in
 state STARTED; cause: org.apache.slider.core.main.ServiceLaunchException:
 python failed with code 2
 org.apache.slider.core.main.ServiceLaunchException: python failed with code
 2
at
 org.apache.slider.server.services.workflow.ForkedProcessService.reportFailure(ForkedProcessService.java:202)
at
 org.apache.slider.server.services.workflow.ForkedProcessService.onProcessExited(ForkedProcessService.java:192)
at
 org.apache.slider.server.services.workflow.LongLivedProcess.run(LongLivedProcess.java:345)
at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
 14/09/11 21:07:25 WARN tools.SliderUtils: Expected exit code={0}, actual
 exit code={2}
 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR]
 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR] Unknown option: --
 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR] usage: python [option] ...
 [-c cmd | -m mod | file | -] [arg] ...
 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR] Try `python -h' for more
 information.
 14/09/11 21:07:25 INFO tools.SliderUtils: [OUT]
 14/09/11 21:07:25 INFO service.AbstractService: Service SliderAppMaster
 failed in state INITED; cause:
 org.apache.slider.core.exceptions.SliderException: Process python failed:
 Expected exit code={0}, actual exit code={2}
 org.apache.slider.core.exceptions.SliderException: Process python failed:
 Expected exit code={0}, actual exit code={2}
at
 org.apache.slider.common.tools.SliderUtils.execCommand(SliderUtils.java:1744)
at
 org.apache.slider.common.tools.SliderUtils.validateSliderServerEnvironment(SliderUtils.java:1777)
at
 org.apache.slider.server.appmaster.SliderAppMaster.serviceInit(SliderAppMaster.java:405)
at
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
 org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:180)
at
 org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471)
at
 org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:401)
at
 org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:626)
at
 org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:1897)
 Exception: org.apache.slider.core.exceptions.SliderException: Process
 python failed: Expected exit code={0}, actual exit code={2}
 14/09/11 21:07:25 ERROR main.ServiceLauncher: Exception:
 org.apache.slider.core.exceptions.SliderException: Process python failed:
 Expected exit code={0}, actual exit code={2}
 org.apache.hadoop.service.ServiceStateException:
 org.apache.slider.core.exceptions.SliderException: Process python failed:
 Expected exit code={0}, actual exit code={2}
at
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at
 org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:180)
at
 org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471)
at

Re: slider create cl1 YARN AM container exception

2014-09-11 Thread Jon Maron

The only way I can duplicate what you’re seeing is as follows:

HW10386:bin jmaron$ python 
Unknown option: --
usage: python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Try `python -h' for more information.
HW10386:bin jmaron$ echo $?
2

But I can’t figure out how that could possibly happen in this case.  Anyone 
else?

— Jon

On Sep 11, 2014, at 10:43 AM, Sumit Mohanty sumit.moha...@gmail.com wrote:

 Whats the OS you are using and what is the version of jdk?
 
 This error, as Jon said, is happening because of the following line:
execCommand(PYTHON, 0, 5000, logger, Python, PYTHON, --version);
 which executes python --version.
 
 
 On Thu, Sep 11, 2014 at 7:05 AM, 牛兆捷 nzjem...@gmail.com wrote:
 
 Python 2.7.
 
 I try the same call from a shell on the AM host and it works well.
 
 2014-09-11 22:01 GMT+08:00 Jon Maron jma...@hortonworks.com:
 
 Which version of python is running on your hosts?  The code is simply
 trying to execute ‘python —version’, and it appears that the python
 version
 you’re using indicates that is doesn’t support that option.  Can you try
 the same call form a shell on the AM host?
 
 — Jon
 
 On Sep 11, 2014, at 9:54 AM, 牛兆捷 nzjem...@gmail.com wrote:
 
 *The YARN only accept one application and the AppMaster container fails
 very soon once it starts.*
 
 *The error log of AM container:*
 
 14/09/11 21:07:24 WARN util.NativeCodeLoader: Unable to load
 native-hadoop
 library for your platform... using builtin-java classes where
 applicable
 14/09/11 21:07:25 INFO appmaster.SliderAppMaster: Login user is hustnn
 (auth:SIMPLE)
 14/09/11 21:07:25 WARN appmaster.SliderAppMaster:
 14/09/11 21:07:25 INFO appmaster.SliderAppMaster:
 14/09/11 21:07:25 INFO appmaster.SliderAppMaster: OpenSSL
 0.9.8e-fips-rhel5
 01 Jul 2008
 14/09/11 21:07:25 WARN appmaster.SliderAppMaster:
 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: Unknown option: --
 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: usage: python
 [option]
 ... [-c cmd | -m mod | file | -] [arg] ...
 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: Try `python -h' for
 more
 information.
 14/09/11 21:07:25 INFO appmaster.SliderAppMaster:
 14/09/11 21:07:25 INFO service.AbstractService: Service python failed
 in
 state STARTED; cause:
 org.apache.slider.core.main.ServiceLaunchException:
 python failed with code 2
 org.apache.slider.core.main.ServiceLaunchException: python failed with
 code
 2
   at
 
 
 org.apache.slider.server.services.workflow.ForkedProcessService.reportFailure(ForkedProcessService.java:202)
   at
 
 
 org.apache.slider.server.services.workflow.ForkedProcessService.onProcessExited(ForkedProcessService.java:192)
   at
 
 
 org.apache.slider.server.services.workflow.LongLivedProcess.run(LongLivedProcess.java:345)
   at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 14/09/11 21:07:25 WARN tools.SliderUtils: Expected exit code={0},
 actual
 exit code={2}
 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR]
 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR] Unknown option: --
 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR] usage: python [option]
 ...
 [-c cmd | -m mod | file | -] [arg] ...
 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR] Try `python -h' for
 more
 information.
 14/09/11 21:07:25 INFO tools.SliderUtils: [OUT]
 14/09/11 21:07:25 INFO service.AbstractService: Service SliderAppMaster
 failed in state INITED; cause:
 org.apache.slider.core.exceptions.SliderException: Process python
 failed:
 Expected exit code={0}, actual exit code={2}
 org.apache.slider.core.exceptions.SliderException: Process python
 failed:
 Expected exit code={0}, actual exit code={2}
   at
 
 
 org.apache.slider.common.tools.SliderUtils.execCommand(SliderUtils.java:1744)
   at
 
 
 org.apache.slider.common.tools.SliderUtils.validateSliderServerEnvironment(SliderUtils.java:1777)
   at
 
 
 org.apache.slider.server.appmaster.SliderAppMaster.serviceInit(SliderAppMaster.java:405)
   at
 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at
 
 
 org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:180)
   at
 
 
 org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471)
   at
 
 
 org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:401)
   at
 
 
 org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:626)
   at
 
 
 org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:1897)
 Exception

Re: Error with Slider -Memcached - ca.config does not seem to be generated

2014-09-10 Thread Jon Maron

In this instance you could probably simply delete the /tmp/work directory and 
restart.  A fix for this issue was recently committed (sorry - I don’t recall 
the exact commit id/date)

— Jon

On Sep 10, 2014, at 4:23 AM, Shivaji Dutta sdu...@hortonworks.com wrote:

 Hello All,
 
 I am trying to setup a Memcached instance with slider and when I submit the 
 job it fails.
 
 Essentially looks like the ca.config file is not getting generated. Any 
 thoughts on what I may be doing wrong?
 
 Shivaji
 
 [root@shivajidemo security]# ls -ltr
 total 16
 drwx--. 3 yarn hadoop 4096 Aug 28 03:28 db
 -rw-r--r--. 1 yarn hadoop   50 Sep  7 23:05 pass.txt
 -rw-r--r--. 1 yarn hadoop 3311 Sep  7 23:20 ca.key
 -rw-r--r--. 1 yarn hadoop 1647 Sep  7 23:20 ca.csr
 
 
 
 http://mail-archives.apache.org/mod_mbox/incubator-slider-dev/201407.mbox/%3CJIRA.12729067.1406082077987.27102.1406096918325@arcas%3E
 
 
 14/09/07 23:20:27 INFO security.CertificateManager: Generation of server 
 certificate
 14/09/07 23:20:29 INFO security.SecurityUtils: Command openssl genrsa -des3 
 -passout pass: -out /tmp/work/security/ca.key 4096  was finished with 
 exit code: 0 - the operation was completed successfully.
 14/09/07 23:20:29 INFO security.SecurityUtils: Command openssl req -passin 
 pass: -new -key /tmp/work/security/ca.key -out /tmp/work/security/ca.csr 
 -batch was finished with exit code: 0 - the operation was completed 
 successfully.
 14/09/07 23:20:29 WARN security.SecurityUtils: Command open ca 
 -create_serial -out /tmp/work/security/ca.crt -days 365 -keyfile 
 /tmp/work/security/ca.key 
 -key uIUzmK27I0QordSPjMnNMRLGhMawoAQDuarijQFvFJuA487TVX -selfsign -extensions 
 jdk7_ca -config /tmp/work/security/ca.config -batch -infiles 
 /tmp/work/security/ca.csr was finished with exit code: 1 - an error occurred 
 parsing the command options.
 14/09/07 23:20:29 WARN security.SecurityUtils: Command openssl pkcs12 -export 
 -in /tmp/work/security/ca.crt -inkey /tmp/work/security/ca.key -certfile 
 /tmp/work/security/ca.crt -out /tmp/work/security/keystore.p12 -password 
 pass: -passin pass: 
 
 Sep 07, 2014 11:20:30 PM 
 com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
 INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
 14/09/07 23:20:33 WARN mortbay.log: failed 
 SslSelectChannelConnector@0.0.0.0:0: java.io.FileNotFoundException: 
 /tmp/work/security/keystore.p12 (No such file or directory)
 14/09/07 23:20:33 WARN mortbay.log: failed 
 SslSelectChannelConnector@0.0.0.0:0: java.io.FileNotFoundException: 
 /tmp/work/security/keystore.p12 (No such file or directory)
 14/09/07 23:20:33 WARN mortbay.log: failed Server@38247deb: 
 org.mortbay.util.MultiException[java.io.FileNotFoundException: 
 /tmp/work/security/keystore.p12 (No such file or directory), 
 java.io.FileNotFoundException: /tmp/work/security/keystore.p12 (No such file 
 or directory)]
 14/09/07 23:20:33 ERROR agent.AgentWebApp: Unable to start agent server
 org.mortbay.util.MultiException[java.io.FileNotFoundException: 
 /tmp/work/security/keystore.p12 (No such file or directory), 
 java.io.FileNotFoundException: /tmp/work/security/keystore.p12 (No such file 
 or directory)]
   at org.mortbay.jetty.Server.doStart(Server.java:188)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
 org.apache.slider.server.appmaster.web.rest.agent.AgentWebApp$Builder.start(AgentWebApp.java:102)
   at 
 org.apache.slider.server.appmaster.SliderAppMaster.startAgentWebApp(SliderAppMaster.java:755)
   at 
 org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:605)
   at 
 org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:421)
   at 
 org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:186)
   at 
 org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:474)
   at 
 org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:405)
   at 
 org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:629)
   at 
 org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:1613)
 java.io.FileNotFoundException: /tmp/work/security/keystore.p12 (No such file 
 or directory)
 
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


--

Re: Self containing all Slider app artifacts

2014-08-30 Thread Jon Maron


On Aug 30, 2014, at 2:55 PM, Sumit Mohanty smoha...@hortonworks.com wrote:

 Should we also move slider-client.tar.gz and application package into the
 application instance specific folder in HDFS? This along with the changes
 due to SLIDER-330 will likely make the application bits completely self
 contained.

Maybe I’ve been buried in security related work for too long so…

1)  slider-client.tar.gz?  Do you mean slider-agent.tar.gz or is there a client 
installer?
2)  what is the “application instance specific folder”?  
hdfs://user/username/slider or the like?
3)  Do we really want to group Slider/AM artifacts and various application 
artifacts together?  Perhaps under a shared parent directory rather than the 
same directory?

 
 -Sumit
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Few changes/improvements to the packaging structure

2014-08-21 Thread Jon Maron


On Aug 21, 2014, at 5:12 PM, Sumit Mohanty smoha...@hortonworks.com wrote:

 There are some changes to the packaging structure based on some of the
 recent JIRA fixes. Let me know if you have any suggestions/objections
 regarding these and I will incorporate them before moving the change to
 develop.
 
 
   1. Ability to get the default configuration from the supplied default
   config file (SLIDER-338) - *Not a breaking change*
   1. Now you can only specify the configuration that needs to be changed
  from default in appConfig.json as the config property bag will be
  initialized with the values from the default config file.
 
  2. Ability to supply a folder with files that will be used as is
   (SLIDER-316) - *Not a breaking change*
  1. If there are config files that are not supported by a Slider
  packages yet then you can supply the config files as an input
 (provide the
  location of the folder containing files) and the files will be available
  locally in the container. Its up to the package author to decide what to
  do. For example, the packages for HBase and Storm copy the
 supplied config
  files as is but the files that are explicitly created overwrite any
  supplied file.

Just to be clear:  that’s a folder in the app package tar?

 
  3. Ability to provide the full content for files such as log4j or
   env.sh (SLIDER-315) - *Not a breaking change*
  1. The difference between this and the above is that you can still
  supply a template as the file content and have the template
 materialized by
  the scripts based on other config properties. Useful, if you want to
  provide a custom env.sh file but still retain the ability to
 auto-fill some
  properties
 
  4. Metainfo explicitly defines the config files needed by the
   application instance (SLIDER-346) - *breaking change*
   1. Its a breaking change because config_types property in
  appConfig.json is no longer referred to for finding out which config
  property types are expected by the app command scripts. Instead, the
  metainfo is read for the list of supported config types.

Is “no config types found in metainfo” an error condition with a message that 
will alert users?  Or are apps with no config types supported and therefore 
this error message can’t really be generated?  Perhaps the existence of the 
property in appConfig should trigger an error (or does it already)?

 
 Close to the next release (post 0.50 RC that is out now), I will modify the
 doc on application configuration to add these details.
 
 -Sumit
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ssl error after changing to slider-dev branch

2014-08-20 Thread Jon Maron

that looks fine.  Can you do the following:

- Look for logging statements in the AM log like the following:

14/08/20 15:56:04 INFO mortbay.log: Started 
SslSelectChannelConnector@0.0.0.0:55882
14/08/20 15:56:05 INFO mortbay.log: Started 
SslSelectChannelConnector@0.0.0.0:48592

  the connection attempt in you agent side log should be to one of these ports 
(probably the first)

  If it’s neither of those then your probably trying to connect to the AM port, 
which is related in the following log statement:

14/08/20 15:56:05 INFO http.HttpServer2: Jetty bound to port 33238
 
  If it’s an attempt to that latter port then you apparently haven’t picked up 
the fix for SLIDER-333.

— Jon

On Aug 20, 2014, at 12:17 PM, Rui Zhang rzh...@vertica.com wrote:

 Hi, Jon,
 
 I tried the new version but the error still exists.
 
 I attached my slider-client.xml. Is there something wrong with the 
 configuration?
 
 Rui
 On 08/20/2014 10:05 AM, Jon Maron wrote:
 Hi,
 
   A fix for Slider-333 had just been merged and addresses some issues that 
 appear to be similar.  You may want to checkout the latest from develop 
 branch and see if that works better.
 
 — Jon
 
 On Aug 19, 2014, at 4:41 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Log attached. This is generated when I run the command logger example.
 
 On 08/19/2014 03:38 PM, Jon Maron wrote:
 I guess send the full agent and AM logs - I just this morning setup a 
 cluster with no issue (admittedly this was on centos 6.4)
 
 — Jon
 
 On Aug 19, 2014, at 3:20 PM, Rui Zhang rzh...@vertica.com wrote:
 
 I think it is one-way ssl because I didn't set ssl.server.client.auth to 
 true.
 
 BTW, I am using openssl 1.0.1f, Ubuntu14.04 and Hadoop 2.4.0.
 
 Rui
 
 On 08/19/2014 03:06 PM, Jon Maron wrote:
 I’m going to attempt this sort of deployment (I’m assuming you’re 
 attempting two way SSL?) this afternoon (finishing up my current patch) 
 and see if I can recreate the issue.
 
 — Jon
 
 On Aug 19, 2014, at 2:52 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Tried so many methods.
 Changing the signature algorithm to sha256 in the java code and adding 
 the cert to trusted list.
 
 All does not work and the same error.
 
 The certificate is generated in /tmp/work/security so I don't know what 
 is wrong. Is there a self-check test for me to know whether I configure 
 correctly or not?
 
 Thanks.
 
 On 08/15/2014 12:14 PM, Jon Maron wrote:
 OK.  Make sure the AM logs indicate that the openssl commands are 
 succeeding.  You should see log statements displaying some openssl 
 command or statements indicating if the server certificate exists.
 
 On Aug 15, 2014, at 12:05 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Having done all of these but still got this error. It also says that 
 it is not verified when I opened the link in the browser.
 
 Maybe there is some issue with my openssl. I will try to solve and 
 report to you my progress.
 
 Thanks.
 
 
 On 08/15/2014 11:17 AM, Jon Maron wrote:
 - the agent code has been modified to communicate via SSL.  That 
 code is downloaded to each launched container from /slider/agent 
 HDFS folder (slider-agent.tar.gz).  If you have installed an up to 
 date version of slider you’ll need to update that file in HDFS.
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 slider.tar.gz
 
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 slider-client.xml


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ssl error after changing to slider-dev branch

2014-08-20 Thread Jon Maron

Try adding the following to your appConfig.json:

  components: {
. . .
},
slider-appmaster: {
  jvm.heapsize: 256M,
  jvm.opts: -Djavax.net.debug=all
},

You should see some output from the protocol handshake, probably in 
slider-out.txt.  Maybe that’ll tell us more about the nature of the SSL issue 
in your environment.

— Jon

On Aug 20, 2014, at 1:37 PM, Rui Zhang rzh...@vertica.com wrote:

 I found:
 
 14/08/20 11:37:41 INFO mortbay.log: Started 
 SslSelectChannelConnector@0.0.0.0:36655
 14/08/20 11:37:41 INFO mortbay.log: Started 
 SslSelectChannelConnector@0.0.0.0:35861
 
 in am log.
 
 And
 
 INFO 2014-08-20 15:38:21,060 NetUtil.py:38 - Connecting to the following url 
 https://rzhang-HP-ZBook-15:36655/ws/v1/slider/agents/
 ERROR 2014-08-20 15:38:21,065 NetUtil.py:52 - [Errno 8] _ssl.c:510: EOF 
 occurred in violation of protocol
 ERROR 2014-08-20 15:38:21,065 NetUtil.py:54 - SSLError: Failed to connect. 
 Please check openssl library versions.
 Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more 
 details.
 INFO 2014-08-20 15:38:21,065 NetUtil.py:76 - Server at 
 https://rzhang-HP-ZBook-15:36655/ws/v1/slider/agents/ is not reachable, 
 sleeping for 10 seconds...
 
 in the agent log.
 It picked the first port. So I think I have already applied the fix.
 
 Thanks.
 Rui
 On 08/20/2014 12:39 PM, Jon Maron wrote:
 that looks fine.  Can you do the following:
 
 - Look for logging statements in the AM log like the following:
 
 14/08/20 15:56:04 INFO mortbay.log: Started 
 SslSelectChannelConnector@0.0.0.0:55882
 14/08/20 15:56:05 INFO mortbay.log: Started 
 SslSelectChannelConnector@0.0.0.0:48592
 
   the connection attempt in you agent side log should be to one of these 
 ports (probably the first)
 
   If it’s neither of those then your probably trying to connect to the AM 
 port, which is related in the following log statement:
 
 14/08/20 15:56:05 INFO http.HttpServer2: Jetty bound to port 33238
 If it’s an attempt to that latter port then you apparently haven’t 
 picked up the fix for SLIDER-333.
 
 — Jon
 
 On Aug 20, 2014, at 12:17 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Hi, Jon,
 
 I tried the new version but the error still exists.
 
 I attached my slider-client.xml. Is there something wrong with the 
 configuration?
 
 Rui
 On 08/20/2014 10:05 AM, Jon Maron wrote:
 Hi,
 
   A fix for Slider-333 had just been merged and addresses some issues that 
 appear to be similar.  You may want to checkout the latest from develop 
 branch and see if that works better.
 
 — Jon
 
 On Aug 19, 2014, at 4:41 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Log attached. This is generated when I run the command logger example.
 
 On 08/19/2014 03:38 PM, Jon Maron wrote:
 I guess send the full agent and AM logs - I just this morning setup a 
 cluster with no issue (admittedly this was on centos 6.4)
 
 — Jon
 
 On Aug 19, 2014, at 3:20 PM, Rui Zhang rzh...@vertica.com wrote:
 
 I think it is one-way ssl because I didn't set ssl.server.client.auth 
 to true.
 
 BTW, I am using openssl 1.0.1f, Ubuntu14.04 and Hadoop 2.4.0.
 
 Rui
 
 On 08/19/2014 03:06 PM, Jon Maron wrote:
 I’m going to attempt this sort of deployment (I’m assuming you’re 
 attempting two way SSL?) this afternoon (finishing up my current 
 patch) and see if I can recreate the issue.
 
 — Jon
 
 On Aug 19, 2014, at 2:52 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Tried so many methods.
 Changing the signature algorithm to sha256 in the java code and 
 adding the cert to trusted list.
 
 All does not work and the same error.
 
 The certificate is generated in /tmp/work/security so I don't know 
 what is wrong. Is there a self-check test for me to know whether I 
 configure correctly or not?
 
 Thanks.
 
 On 08/15/2014 12:14 PM, Jon Maron wrote:
 OK.  Make sure the AM logs indicate that the openssl commands are 
 succeeding.  You should see log statements displaying some openssl 
 command or statements indicating if the server certificate exists.
 
 On Aug 15, 2014, at 12:05 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Having done all of these but still got this error. It also says 
 that it is not verified when I opened the link in the browser.
 
 Maybe there is some issue with my openssl. I will try to solve and 
 report to you my progress.
 
 Thanks.
 
 
 On 08/15/2014 11:17 AM, Jon Maron wrote:
 - the agent code has been modified to communicate via SSL.  That 
 code is downloaded to each launched container from /slider/agent 
 HDFS folder (slider-agent.tar.gz).  If you have installed an up to 
 date version of slider you’ll need to update that file in HDFS.
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com

Re: ssl error after changing to slider-dev branch

2014-08-19 Thread Jon Maron

I’m going to attempt this sort of deployment (I’m assuming you’re attempting 
two way SSL?) this afternoon (finishing up my current patch) and see if I can 
recreate the issue.

— Jon

On Aug 19, 2014, at 2:52 PM, Rui Zhang rzh...@vertica.com wrote:

 Tried so many methods.
 Changing the signature algorithm to sha256 in the java code and adding the 
 cert to trusted list.
 
 All does not work and the same error.
 
 The certificate is generated in /tmp/work/security so I don't know what is 
 wrong. Is there a self-check test for me to know whether I configure 
 correctly or not?
 
 Thanks.
 
 On 08/15/2014 12:14 PM, Jon Maron wrote:
 OK.  Make sure the AM logs indicate that the openssl commands are 
 succeeding.  You should see log statements displaying some openssl command 
 or statements indicating if the server certificate exists.
 
 On Aug 15, 2014, at 12:05 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Having done all of these but still got this error. It also says that it is 
 not verified when I opened the link in the browser.
 
 Maybe there is some issue with my openssl. I will try to solve and report 
 to you my progress.
 
 Thanks.
 
 
 On 08/15/2014 11:17 AM, Jon Maron wrote:
 - the agent code has been modified to communicate via SSL.  That code is 
 downloaded to each launched container from /slider/agent HDFS folder 
 (slider-agent.tar.gz).  If you have installed an up to date version of 
 slider you’ll need to update that file in HDFS.
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ssl error after changing to slider-dev branch

2014-08-19 Thread Jon Maron

I guess send the full agent and AM logs - I just this morning setup a cluster 
with no issue (admittedly this was on centos 6.4)

— Jon

On Aug 19, 2014, at 3:20 PM, Rui Zhang rzh...@vertica.com wrote:

 I think it is one-way ssl because I didn't set ssl.server.client.auth to true.
 
 BTW, I am using openssl 1.0.1f, Ubuntu14.04 and Hadoop 2.4.0.
 
 Rui
 
 On 08/19/2014 03:06 PM, Jon Maron wrote:
 I’m going to attempt this sort of deployment (I’m assuming you’re attempting 
 two way SSL?) this afternoon (finishing up my current patch) and see if I 
 can recreate the issue.
 
 — Jon
 
 On Aug 19, 2014, at 2:52 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Tried so many methods.
 Changing the signature algorithm to sha256 in the java code and adding the 
 cert to trusted list.
 
 All does not work and the same error.
 
 The certificate is generated in /tmp/work/security so I don't know what is 
 wrong. Is there a self-check test for me to know whether I configure 
 correctly or not?
 
 Thanks.
 
 On 08/15/2014 12:14 PM, Jon Maron wrote:
 OK.  Make sure the AM logs indicate that the openssl commands are 
 succeeding.  You should see log statements displaying some openssl command 
 or statements indicating if the server certificate exists.
 
 On Aug 15, 2014, at 12:05 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Having done all of these but still got this error. It also says that it 
 is not verified when I opened the link in the browser.
 
 Maybe there is some issue with my openssl. I will try to solve and report 
 to you my progress.
 
 Thanks.
 
 
 On 08/15/2014 11:17 AM, Jon Maron wrote:
 - the agent code has been modified to communicate via SSL.  That code is 
 downloaded to each launched container from /slider/agent HDFS folder 
 (slider-agent.tar.gz).  If you have installed an up to date version of 
 slider you’ll need to update that file in HDFS.
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 
 
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Slider Token Renewal for long running apps

2014-08-18 Thread Jon Maron


On Aug 18, 2014, at 5:02 AM, Steve Loughran ste...@hortonworks.com wrote:

 we can just flip the bit on the config you want to use to create the hdfs
 account ... take the normal config, patch it, then ask for the FS. I've
 used this in the past; it works happily.

I see.  I’m going to give that a shot.  I’ll reuse that config for the token 
renewal/expiry interactions and see if the scheme woks.

 
 It's main role is get by code that calls FileSystem.close() while other
 clients are using it.
 
 
 On 17 August 2014 21:01, Jon Maron jma...@hortonworks.com wrote:
 
 
 On Aug 17, 2014, at 10:35 AM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 On 16 August 2014 23:18, Jon Maron jma...@hortonworks.com wrote:
 
 Once a FileSystem existed for user ‘hdfs’ in the file system cache,
 invocations of FileSystem.get(Configuration) (
 
 
 you can turn the cache off for a filesystem, this gives you a new
 instance.. look in FileSystem.get():
 
   String disableCacheName = String.format(fs.%s.impl.disable.cache,
 scheme);
   if (conf.getBoolean(disableCacheName, false)) {
 return createFileSystem(uri, conf);
   }
 
 i.e add to the conf
 
 fs.hdfs.impl.disable.cache=true
 
 and you get a new one
 
 I suppose that would address the mixed-user issue - the hdfs principal
 would only be leveraged for the token related invocations and the instance
 would not be accessible to the app identity (‘jon’).  But is that a
 configuration setting we could require users to leverage in a secure
 deployment?
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Slider Token Renewal for long running apps

2014-08-17 Thread Jon Maron


On Aug 17, 2014, at 10:35 AM, Steve Loughran ste...@hortonworks.com wrote:

 On 16 August 2014 23:18, Jon Maron jma...@hortonworks.com wrote:
 
 Once a FileSystem existed for user ‘hdfs’ in the file system cache,
 invocations of FileSystem.get(Configuration) (
 
 
 you can turn the cache off for a filesystem, this gives you a new
 instance.. look in FileSystem.get():
 
String disableCacheName = String.format(fs.%s.impl.disable.cache,
 scheme);
if (conf.getBoolean(disableCacheName, false)) {
  return createFileSystem(uri, conf);
}
 
 i.e add to the conf
 
 fs.hdfs.impl.disable.cache=true
 
 and you get a new one

I suppose that would address the mixed-user issue - the hdfs principal would 
only be leveraged for the token related invocations and the instance would not 
be accessible to the app identity (‘jon’).  But is that a configuration setting 
we could require users to leverage in a secure deployment?

 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Slider Token Renewal for long running apps

2014-08-17 Thread Jon Maron



Going Mobile


 On Aug 17, 2014, at 5:35 PM, Sumit Mohanty smoha...@hortonworks.com wrote:
 
 So how does the solution look for two users? Is it a single keytab with two
 principals.

That would be my approach. 

 What sort of visibility each user will have of resources of the
 other user?

Well - each AM instance would only be associated with a single user at a time. 
If a situation did somehow arise where more than one user was logged in it 
would still work since the tokens are credentials associated with a given 
subject. We would, however, require a more robust renewal mechanism that could 
handle multiple token renewal periods.  

 
 
 On Sun, Aug 17, 2014 at 1:01 PM, Jon Maron jma...@hortonworks.com wrote:
 
 
 On Aug 17, 2014, at 10:35 AM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 On 16 August 2014 23:18, Jon Maron jma...@hortonworks.com wrote:
 
 Once a FileSystem existed for user ‘hdfs’ in the file system cache,
 invocations of FileSystem.get(Configuration) (
 
 
 you can turn the cache off for a filesystem, this gives you a new
 instance.. look in FileSystem.get():
 
   String disableCacheName = String.format(fs.%s.impl.disable.cache,
 scheme);
   if (conf.getBoolean(disableCacheName, false)) {
 return createFileSystem(uri, conf);
   }
 
 i.e add to the conf
 
 fs.hdfs.impl.disable.cache=true
 
 and you get a new one
 
 I suppose that would address the mixed-user issue - the hdfs principal
 would only be leveraged for the token related invocations and the instance
 would not be accessible to the app identity (‘jon’).  But is that a
 configuration setting we could require users to leverage in a secure
 deployment?
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ssl error after changing to slider-dev branch

2014-08-15 Thread Jon Maron

I’m having a hard time understanding exactly what your current setup is, so 
here are the details:

- the agent code has been modified to communicate via SSL.  That code is 
downloaded to each launched container from /slider/agent HDFS folder 
(slider-agent.tar.gz).  If you have installed an up to date version of slider 
you’ll need to update that file in HDFS.
- The AM has been updated to support SSL.  If you’ve reinstalled slider then 
that code should be added to the AM as a local resource by yarn

If those two conditions are met, then perhaps you are actually having and issue 
with openssl - it is being leveraged for certificate and keystone generation of 
the server side (by default it is one way SSL so there should be no need for 
the client to generate these resources)

For more information on the SSL setup see 
http://slider.incubator.apache.org/design/ssl_implementation.html

— Jon

On Aug 15, 2014, at 11:11 AM, Rui Zhang rzh...@vertica.com wrote:

 The first error solved. Thanks, Steve.
 
 But the ssl error still exists. BTW, I am not using Ambari so is it possible 
 that I missed some configuration related to SSL in Yarn?
 
 Thanks.
 
 
 On 08/15/2014 10:09 AM, Steve Loughran wrote:
 ok, try now.
 
 
 On 15 August 2014 11:29, Steve Loughran ste...@hortonworks.com wrote:
 
 On 15 August 2014 06:59, Rui Zhang rzh...@vertica.com wrote:
 
 Exception in thread main java.lang.NoClassDefFoundError:
 com/codahale/metrics/MetricRegistry
 at org.apache.slider.server.appmaster.SliderAppMaster.
 clinit(SliderAppMaster.java:206)
 Caused by: java.lang.ClassNotFoundException: com.codahale.metrics.
 MetricRegistry
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 1 more
 
 no, I've guessed the cause.. it's not uploading that JAR...my fault
 
 Will fix ASAP
 
 
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ssl error after changing to slider-dev branch

2014-08-15 Thread Jon Maron

OK.  Make sure the AM logs indicate that the openssl commands are succeeding.  
You should see log statements displaying some openssl command or statements 
indicating if the server certificate exists. 

On Aug 15, 2014, at 12:05 PM, Rui Zhang rzh...@vertica.com wrote:

 Having done all of these but still got this error. It also says that it is 
 not verified when I opened the link in the browser.
 
 Maybe there is some issue with my openssl. I will try to solve and report to 
 you my progress.
 
 Thanks.
 
 
 On 08/15/2014 11:17 AM, Jon Maron wrote:
 - the agent code has been modified to communicate via SSL.  That code is 
 downloaded to each launched container from /slider/agent HDFS folder 
 (slider-agent.tar.gz).  If you have installed an up to date version of 
 slider you’ll need to update that file in HDFS.
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ssl error after changing to slider-dev branch

2014-08-14 Thread Jon Maron

Can you provide more of a stack trace from either the agent logs or the 
application master log?  Thanks!

Going Mobile


 On Aug 14, 2014, at 3:54 PM, Rui Zhang rzh...@vertica.com wrote:
 
 Hi, everyone,
 
 I have changed to the dev branch but now it has this error.
 
 ERROR 2014-08-14 15:52:28,244 NetUtil.py:52 - [Errno 8] _ssl.c:510: EOF 
 occurred in violation of protocol
 ERROR 2014-08-14 15:52:28,244 NetUtil.py:54 - SSLError: Failed to connect. 
 Please check openssl library versions.
 
 How to solve it?
 Thanks
 
 -- 
 Rui Zhang
 Software engineer Intern
 Vertica, an HP Company
 rzh...@vertica.com
 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

1 2 >

1 - 100 of 121 matches

Mail list logo