On Apr 8, 2013, at 11:31 , JiHyoun Park wrote:
Dear Matthieu
Yes, I am trying to port s4-yarn to 0.6.0.
-testMode option is what 0.6.0 has now in org.apache.s4.tools.Deploy.java,
// Explicitly shutdown the JVM since Gradle leaves non-daemon
threads running that delay the termination
if (!deployArgs.testMode) {
System.exit(0);
}
just like the -shutdown option in the same class file of S4-25.
// Explicitly shutdown the JVM since Gradle leaves non-daemon
threads running that delay the termination
if (deployArgs.shutdown) {
System.exit(0);
}
But the difference is
@Parameter(names = "-testMode", description = "Special mode for
regression testing", hidden = true)
@Parameter(names = "-shutdown", description = "Shutdown JVM after
deployment. Useful to avoid waiting for remaining long running threads from
Gradle", arity = 1)
I tried to pass "-testMode=true" to Deploy.main()
String [] argDeploy = {"-s4r=" + s4r_path_HDFS,
"-cluster=" + cluster_name,
"-appName=" + application_name,
"-testMode=true"
};
Deploy.main(argDeploy);
but got an error.
Cannot parse arguments: class com.beust.jcommander.ParameterException -> Was
passed main parameter 'true' but no main parameter was defined
With JCommander, the CLI parser we use, "-testMode" is a boolean parameter with
arity of 0. So either you specify it : "-testMode" -> takes "true" value, or
you don't (takes default "false" value).
The error you are reporting might come from that.
Hope this helps,
Matthieu
Usage
Usage: <main class> [options]
Options:
-a, -appClass Full class name of the application class
(extending App or AdapterApp)
* -appName Name of S4 application.
* -c, -cluster Logical name of the S4 cluster
-debug Display debug information from the build system
Default: false
-gradleOpts gradle system properties (as in GRADLE_OPTS
environment properties) passed to gradle
scripts
Default: []
-help usage
Default: false
-modulesClasses, -emc, -mc Fully qualified class names of custom modules
Default: []
-modulesURIs, -mu URIs for fetching code of custom modules
Default: []
-namedStringParameters, -p Comma-separated list of inline configuration
parameters. Syntax:
'-p=name1=value1,name2=value2 '
Default: []
-s4r URI to existing s4r file
-timeout Connection timeout to Zookeeper, in ms
Default: 10000
-zk ZooKeeper connection string
Default: localhost:2181
Best Regards
Jihyoun
On Mon, Apr 8, 2013 at 4:23 PM, Matthieu Morel
<[email protected]<mailto:[email protected]>> wrote:
On Apr 8, 2013, at 06:17 , JiHyoun Park wrote:
Dear Matthieu
What we need for the Yarn integration is just to include 2 hdfs-deploy-related
classes, which were developed at S4-25, in the s4 core-deploy package.
- org.apache.s4.deploy.HdfsFetcherModule.java
- org.apache.s4.deploy.HdfsS4RFetcher.java
And, simple modification at org.apache.s4.core.util.RemoteFileFetcher.java to
be able to identify "hdfs" as one of s4r download sources.
if ("hdfs".equalsIgnoreCase(scheme)){
return new HdfsArchiveFetcher().fetch(uri);
}
Hi Jihyoun,
adding Yarn/Hadoop dependencies in s4-core is something we want to avoid, so
that we don't force a specific version of Hadoop.
Instead, for S4 0.6, we could actually inject the fetchers through a custom
module. We'd ship the custom module separately from s4-core, avoiding the
dependency coupling issue.
Can you add a ticket for this? Thanks!
I also would like to ask you one more favour.
Can we have the "-shutdown" option again at org.apache.s4.tools.Deploy.java to
avoid automatic shutdown of S4 application after deployment?
I tried to use the "-testMode" option, which seemed to act just like the
"-shutdown" option but my s4 application couldn't recognize the option.
If I understand correctly, you tried to port s4-yarn to 0.6.0?
Did you add the -testMode option in replacement of -shutdown=false here
https://github.com/apache/incubator-s4/blob/S4-25/subprojects/s4-yarn/src/main/java/org/apache/s4/tools/yarn/S4YarnClient.java#L387
?
Note that the S4 app being shut down without this option is actually a side
effect of the deployment/configuration s4 tool on Yarn: we need to prevent
system.exit statements since we are running in a contained environment.
Also, if you have a working port of S4-25 to S4 0.6, you could submit a patch
and we could integrate it. (if you are still iterating you can also fork the
project on github and share your code of the port there, so we can provide
feedback).
Thanks,
Matthieu
Best Regards
Jihyoun
On Thu, Apr 4, 2013 at 5:26 PM, Matthieu Morel
<[email protected]<mailto:[email protected]>> wrote:
Hi,
Note that S4 0.5 was a complete refactoring, therefore its main objective was
to provide a functional implementation. Thus there was room for improvements
and the focus of the 0.6 release was on performance and usability.
Most performance improvements in S4 0.6 come from:
- adding metrics to identify bottlenecks
- improving serialization and deserialization
- minimizing buffer copies (and pressure on the garbage collector)
- leveraging multithreading and async processing, notably by updating Netty
pipelines
Regards,
Matthieu
On Apr 4, 2013, at 07:01 , Siddharth wrote:
Hi - Can the development team highlight the exact solution/fix that made it
possible for 0.6 release to be so fast compared to the earlier release.
Thanks in advance,
Siddharth
________________________________
From: Matthieu Morel [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, April 03, 2013 3:02 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: S4-0.6.0 and Hadoop Yarn
On Apr 2, 2013, at 19:46 , Jeryl Cook wrote:
"handle 200K+ messages per sec" ,in one instance? or do you mean clustered?
This is for processing small events injected into 1 stream on 1 node. By using
more streams and more nodes the overall throughput can get quite higher.
Note that this is a baseline with a basic PE graph (1 injector and 1 PE
prototype) and performance in practice will be impacted by the complexity of
the application and the nature of the processing, the hardware and allocated
resources, the size and complexity of messages etc..
A benchmarking framework is included in the distribution, so you can reproduce
the experiments.
Regards,
Matthieu
On Mon, Apr 1, 2013 at 10:42 PM, JiHyoun Park
<[email protected]<mailto:[email protected]>> wrote:
Hi
I am testing the newest release of S4.
It's fantastic that the stream throughput of S4 0.6.0 has been improved to
handle 200K+ messages per sec.!
However, it seems that S4-25 branch - deploying S4 applications with Yarn - is
not included in the 0.6.0 package yet.
I already built a system to run S4 applications on Yarn and want to migrate its
S4 framework from 0.5.0 to 0.6.0.
How can I use the 'deploying S4 applications with Yarn' feature on S4 0.6.0?
Best Regards
Jihyoun
--
Jeryl Cook
Founder & Chief Executive Officer
VanitySoft, Inc.
A Geo Business Intelligence Technology Consulting Firm
www.vanity-soft.com<http://www.vanity-soft.com/>
www.linkedin.com/in/jerylcook<http://www.linkedin.com/in/jerylcook>
Get answers to "who knew what, when, and where"... and everything in between.
____________________________________________________
This message contains information which may be confidential and privileged.
Unless you are the addressee (or authorized to receive for the addressee), you
may not use, copy or disclose to anyone the message or any information
contained in the message. If you have received the message in error, please
advise the sender by reply e-mail
[email protected]<mailto:[email protected]>, and delete the
message.